computer systems are dynamical systemslizb/papers/todd-chaos08.pdf · computer systems are...

Computer Systems are Dynamical Systems

Todd Mytkowicz, Amer Diwan and Elizabeth Bradley

Department of Computer Science

University of Colorado

Boulder, Colorado

Abstract

In this paper, we propose a nonlinear dynamics-based framework for modeling and analyzing

computer systems. Working with this framework, we use a custom measurement infrastructure

and delay-coordinate embedding to study the dynamics of these complex nonlinear systems. We

find strong indications, from multiple corroborating methods, of low-dimensional dynamics in the

performance of a simple program running on a popular Intel processor—including the first experi-

mental evidence of chaotic dynamics in real computer hardware. We also find that the dynamics

change completely when we run the same program on a different Intel processor, or when that

program is changed slightly. This not only validates our framework; it also raises important is-

sues about computer analysis and design. These engineered systems have grown so complex as to

defy the analysis tools that are typically used by their designers: tools that assume linearity and

stochasticity, and essentially ignore dynamics. The ideas and methods developed by the nonlinear

dynamics community are, we claim, a much better way to study, understand, and design modern

computer systems.

1

Lead Paragraph:

Though it is not necessarily the view taken by those who design them, modern comput-

ers are deterministic nonlinear dynamical systems, and it is both interesting and useful to

treat them as such. We conjecture that the dynamics of a computer can be described by

an iterated map with two components, one dictated by the hardware and one dictated by

the software. Using a custom measurement infrastructure to get at the internal variables

of this million-transistor system without disturbing the dynamics, we gather time-series

data from a variety of simple programs running on two popular microprocessors, then used

delay-coordinate embedding to study the associated dynamics. We found strong indications,

from multiple corroborating methods, of a low-dimensional attractor in the dynamics of a

simple, three-instruction loop running on an Intel Core2 machine, including the first exper-

imental evidence of chaos in real computer hardware. When we ran the same program on

an Intel Pentium 4, the dynamics became periodic; when we reordered the instructions in

the program, extra loops appeared in the attractor, increasing its dimension from 0.8 to

1.1 and decreasing its largest Lyapunov exponent by 25%. When we alternated between the

two programs, the reconstructed trajectory alternated between the corresponding attractors,

leading us to a view of a computer as a dynamical system operating under the influence of

a series of bifurcations induced by the code that it is running. Somewhat surprisingly, the

topological dimension of the embedding space was similar (m = 10− 15) in all of these ex-

periments, which is probably a reflection of the constraints imposed on these systems by the

x86 design specification that both machines follow. All of these low dimensions—attractors

with dimension ≈ 1 in state spaces with tens of axes—are somewhat surprising in a system

as complex as a microprocessor. Computers are organized by design into a small number of

tightly coupled subsystems, however, which greatly reduces the effective number of degrees

of freedom in the system. These results are not only interesting from a dynamics stand-

point; they are also important for the purposes of computer simulation and design. These

engineered systems have grown so complex as to defy the analysis tools that are typically

used by their designers: tools that assume linearity and stochasticity, and essentially ignore

dynamics. The ideas and methods developed by the nonlinear dynamics community are, we

claim, a much better way to study, understand, and design modern computer systems.

2

I. INTRODUCTION

Modern computers are complex nonlinear dynamical systems. Microprocessor registers,

memory contents and even temperature of different regions of the chip are state variables

of these systems. The logic hardwired into a microprocessor, combined with the software

executing on that hardware, defines the system’s dynamics. Under the influence of these

dynamics, the computer’s state moves on a trajectory through its high-dimensional state

space as the clock cycles progress and the program executes. We call this the performance

dynamics of a computer system to distinguish it from the program dynamics: the sequence

of steps that the code follows. While the program dynamics are generally simple and easy

to understand, the performance dynamics of a program running on a modern computer can

be complex and even chaotic. The implications of this are not only interesting, but actu-

ally quite important—both from a dynamics standpoint and for the purposes of computer

simulation and design.

This thinking is in stark contrast to the traditional approaches in the computer systems

literature, where computers are modeled as high-dimensional stochastic processes, temporal

details are lumped into aggregate metrics, and the observation and perturbation issues that

are inherent in any real measurement process are completely ignored. Only nine of the 364

papers about understanding the performance of microprocessor innovations that appeared

between 2004 and 2007 in the top three micro-architecture conferences1 looked at the time-

varying behavior of real hardware. Modern computers have millions of transistors that

interact in complex and nonlinear ways, and almost any measurement of their state can

perturb their behavior. As a result, the performance dynamics of computer systems can

look random—thus giving rise and credence to the idea that a computer’s performance

evolution comes about from a stochastic process. As readers of this journal are well aware,

however, looking random and being random can be two different things if one is using the

wrong analysis tools. The dynamics of a computer is dictated by the deterministic physical

laws of its parts: wires, semiconductors, and so on. Neither this hardware nor the code that

runs on it is stochastic, and so it seems logical to take a dynamical systems approach to the

analysis of their coupled behavior.

Following this reasoning, we treat the task of understanding a computer’s behavior as

equivalent to analyzing the system’s state-space dynamics. This is not only a useful way

3

to describe and understand these systems; it also lets us compare two such systems, which

is essential to the task of modeling and validating their behavior. We formulate a general

model for computer dynamics, in the form of an iterated map. Validating this model is

not an easy task; as is the case in many laboratory experiments, it is impossible to mea-

sure all of the state variables of a running computer and difficult to avoid perturbing those

that can be measured. We use a custom measurement infrastructure to solve this problem

and gather time-series data reflecting the performance of a running computer. We employ

delay-coordinate embedding on these data to reconstruct the dynamics of various computer

programs running on two different physical machines. We analyze the different influences

on the dynamics, demonstrating that both hardware and software play complicated, non-

linear roles in the systems’ behavior, and we provide the first experimental evidence of

low-dimensional chaotic dynamics in real computer hardware. We show that the dynamics

of a computer undergoes an ongoing series of bifurcations as the execution moves through

different parts of the code. It is important to note that not all of this dynamical complex-

ity and richness manifests itself unless one studies a real computer, not just a simulator

that mimics its behavior—the common approach in previous work on this topic in both the

computer architecture and dynamical systems literatures.

The work described in this paper is not only an interesting application of nonlinear

dynamics techniques to a new area. It represents a completely new angle of attack on

some of that area’s most pressing problems. Currently, for instance, the simulators that

are employed by computer architects are validated only using end-to-end metrics like the

execution time of a program. The results presented here make it clear that aggregate metrics

are inadequate and the details of the dynamics matter: two computer systems should be

treated as similar if and only if their dynamical invariants match up. Our results further

suggest that bifurcation analysis can be useful in identifying different behavioral regimes in

a computer program, and that dynamical invariants can be useful in analyzing the behavior

in each regime. In the broader picture, our results suggest that one cannot understand

the behavior of a computer by understanding how the hardware and software subsystems

function and then composing their dynamics. Instead, one must treat the system as a

network of complex, nonlinear, interacting parts and analyze the resulting dynamics as a

whole.

This paper is organized as follows: Section II introduces and motivates a dynamical-

4

systems model of computer performance. Section V uses this framework as a basis for

investigating the dynamics of a simple program running on an Intel Core2 machine. Sec-

tion IV repeats the same experiments and analyses using an Intel Pentium 4, demonstrating

the effects of hardware on the performance dynamics of the coupled hardware/software sys-

tem. Section III investigates the dynamics of another simple program, then combines that

program with the program of Sections V and IV and analyzes the composed dynamics. Sec-

tion VI closes with a discussion of the implications of this work for both nonlinear dynamics

and computer architecture.

II. A DYNAMICAL SYSTEMS VIEW OF COMPUTER SYSTEMS

Computer software is traditionally written in a high level language—C, for instance—and

then compiled down into a form that a specific type of computer can execute. Computers are

grouped into equivalence classes according to the instruction sets that they use, known as

Instruction Set Architectures (ISAs). The most popular of these ISAs, known as x86, is used

by the Intel Pentium 4 and Intel Core2, among others. An ISA is an abstract specification

of what each instruction does; the actual implementation is up to particular hardware man-

ufacturer. Two different manufacturers, AMD and Intel, implement x86 microprocessors,

for instance. Microprocessors from both companies can interpret and run any x86 program,

but the way in which they do so is drastically different, since each company has different

ideas about how to make the implementation as efficient as possible. Thus, the dynamics of

a computer system depends both on the ISA and on the implementation. Indeed, as shown

later in this paper, the same software can produce chaotic behavior on one microprocessor

and periodic behavior on another, even if they both follow the x86 specification. The fol-

lowing sections describe how to use dynamical systems ideas to understand the interactions

that produce this behavior.

A. Computer Dynamics, Part I: State Space

The dynamical differences between computers begin with the state space. An ISA op-

erates on abstract variables, moving data around between them and performing operations

upon them. It specifies the low-level byte organization of the data—32 bytes, in the case of

5

most x86 instructions. This limits addressable memory in any x86 machine to 232 locations,

or 4 Gigabytes2. We treat this addressable space—a 4GB-long vector of bytes ~m—as the

state space of the ISA dynamics. But this is only part of the story. As mentioned above,

different microprocessor implementations instantiate these 32-byte data in different ways:

using different configurations of transistors on the chip, for instance, and different strategies

for their organization and use. These design choices not only affect the dynamics, but ac-

tually introduce extra state variables into the system. Some of these are commonplace in

physics problems: temperature, for instance, can affect how fast a computer runs. Other

implementation-related influences upon the dynamics are far more complicated. Modern

computers attempt to anticipate what data a program will need next, for instance, and

“push” it to faster memory. The computer will act differently, then, if a variable in ~m is in

one memory location than another—not in terms of the program dynamics, from the stand-

point of the ISA, but in terms of performance. Though one cannot know or measure all

of these effects, they play critical roles in the dynamics; we include them in the framework

presented here by adding a vector of unknown implementation variables ~u to the state space

of the system.

The following sections describe the specification and implementation dynamics in more

detail, filling in some details about both ~m and ~u.

B. Computer Dynamics, Part II: Specification

In the abstract world of a specification like x86, a computer program is a sequence of

instructions that affect ~m in different ways. That is, the state evolves under the influence

of a deterministic update rule (the program’s instructions) in discrete time (one instruction

per time step). From a dynamical systems perspective, this is an iterated map or cellular

automaton:

~mn+1 = ~Fcode(~mn) (1)

where ~Fcode is the deterministic update rule and ~mi is the state of the ISA variables at time

i. It is well known among dynamicists that iterated maps like this can exhibit complex and

interesting dynamics. Authors of computer programs are also well aware of their sensitiv-

ity to small changes (e.g., bugs), but they generally try to steer clear of the “interesting”

6

dynamics by writing highly structured, periodic code—loops, subroutines, and so on.

The ISA is only an abstraction, however, as mentioned above. Depending on how it is

implemented, an x86 computer running the same program—even a very simple one—can

behave very differently, as shown later in this paper. Any dynamical model of this system,

then, must include implementation effects.

C. Computer Dynamics, Part III: Implementation

An implementation, which instantiates an ISA in the form of silicon and metal on a

semiconductor die, can be an extremely complex mapping. At the time of this writing,

modern computer microprocessors have on the order of 100 million transistors, organized in

a highly complicated manner. Most microprocessors use a multi-level memory design—disk,

RAM, cache, etc.—that allow progressively faster access to frequently used data. Many

attempt to anticipate resources that future instructions will need (e.g., a particular piece

of data) and have it available (e.g., in the fastest memory level, the cache) when those

instructions begin execution. Some microprocessors attempt to identify instructions that can

execute in parallel, then distribute them across multiple execution pipelines—so-called “out-

of-order” execution. All of these complex nonlinear processes are managed by proprietary

logic that is built into the microprocessor—and not visible to those that do not work for the

hardware manufacturer.

The design strategies that produce these dynamics have greatly improved computer per-

formance, but at a significant expense: computer architects can no longer completely un-

derstand their creations. Recently, for example, some design changes that were expected

to be improvements proved not to work well (e.g., the “trace cache” or hyper-threading on

the Pentium 4, both of which can unexpectedly degraded the chip’s performance[1]). The

models and mathematics that are used in this field were unable to predict this effect, or even

analyze it post facto, suggesting that they are not up to the challenge of analyzing modern

computers. Dynamical systems, however, offers a useful model for this situation—a deter-

ministic, nonlinear, iterated map—together with powerful analytical tools for understanding

the associated dynamics.

As a model of computer dynamics, equation (1) is only part of the picture. The full

dynamics are a complicated function of both the code and the microprocessor on which it

7

is running, acting upon the state vector ~x = {~m, ~u} introduced in Section IIA:

~xn+1 = ~Fcode ◦ ~Fimpl(~xn) (2)

where ~Fcode is the program dynamics described in the previous section and ~Fimpl models the

dynamical effects of the implementation processes that are described in this section. This

model of computer dynamics provides a complete description of how a particular program

executes on a particular microprocessor, bringing out the roles that each plays in the process.

Of course, we cannot know (let alone measure) all of the variables that make up ~x, nor

can we know or deduce ~Fimpl. The dimension of the system is potentially enormous: the

ISA state vector ~m alone has on the order of 4 × 109 elements, and the physical state

space ~u could be even larger. Even so, this is a useful framework. The dynamics of the

computers studied in this paper appears to occupy a low-dimensional subspace of ~x. Given

the formulation of equation (2), then, delay-coordinate embedding and other dynamical

systems techniques can be used to examine the evolution of ~xn and elucidate some of the

features of the function ~Fcode ◦ ~Fimpl, including the individual roles of ~Fcode and ~Fimpl. This

study, reported in Sections IV–III for different ~Fcode and ~Fimpl, is not only an interesting

exploration of a complex dynamical system; it provides a means to compare two computer

systems—something that computer architects currently cannot do in a truly effective way.

D. Measuring Computer Dynamics

There are three challenges in tracking the performance of a computer: state-space size,

measurement perturbation, and observability. Simply dumping the main memory of a 1GB

machine running at 2.4 gigahertz once every 100,000 cycles, for instance, would produce a

terabyte of data every 40 milliseconds. Producing this data would dominate the computer’s

dynamics; processing it would overwhelm any analysis tool. And since one must use the

computer’s own facilities to measure its internal state, any measurement can easily change

the dynamics that one wants to examine. And many of a computer’s internal variables are

simply inaccessible.

The practice in computer performance analysis is to measure a limited number of internal

variables using the hardware performance monitoring facility (HPM) that is embedded in

almost all modern microprocessor chips. This facility typically contains two to eight ded-

8

icated registers, each of which can count instances of a different user-programmed event.

Using these registers, one can capture the total number of instructions executed per cycle

(IPC), for instance, or the total number of references to the data cache. These are the most

widely used metrics in the computer performance analysis literature. IPC is a good way to

study the performance of modern microprocessors, most of which can execute more than one

instruction per clock cycle. Cache-access data is an effective way to study the dynamical

role of program’s memory use, which is a key bottleneck in computer performance.

The HPM facility maintains a running count of events. To capture that information and

save it for later use, we wrote a monitoring tool that periodically stops a running program,

reads the current values in the HPMs, and stores them to disk. Because the HPM registers

are in hardware, the counting of events does not perturb the running program. Storing these

counters in memory, though, or on disk, can affect these shared structures and thus introduce

noise into our measures. To avoid this insofar as possible, we sample the HPMs infrequently

and check the effect of the sampling frequency on the dynamics. To further reduce noise,

our tool only monitors hardware events when the target program is running, and not when

the operating system (or the monitoring tool itself) have control of the microprocessor. We

follow best practices from the computer performance analysis community when measuring

the system: we only use local disks and limit the number of other processes that are running

on the machine.

Given measurements of a single state variable, obtained in this fashion, one can use

delay-coordinate embedding to reconstruct the internal dynamics of the system. The Takens-

Whitney-Mane theorems guarantee that such an embedding, if done correctly, is topologi-

cally equivalent to the underlying dynamics. Because dynamical invariants like the largest

Lyapunov exponent are invariant under diffeomorphism, one can calculate the invariants of

the embedded dynamics and extrapolate the results to the true dynamics.

Of course, the number of instructions executed or memory accesses made during a given

clock cycle are most likely not state variables of a microprocessor, but rather complex nonlin-

ear functions of multiple state variables of that system. Even so, we believe it is appropriate

to use either IPC or memory accesses (specifically, unsuccessful accesses to the fastest level

of the memory, the cache) for delay-coordinate embedding, for two reasons. First, we re-

peated all of our analysis with both of these measures and found almost identical values for

every dynamical invariant. Second, our dynamical invariant results corroborate each other

9

nicely. Detailed results appear in the later sections of this paper. Because neither the data

nor the numerical tools for calculating these invariants is perfect, these kinds of multiple

corroborating measures are critical to establishing confidence in the methods—and in the

results.

Time is an interesting issue in a system like a computer, where both discrete (digital)

and continuous (analog) dynamics are in play. The time scale for the discrete-time dynam-

ics is imposed by the internal clock on the chip. Designers intentionally choose the clock

cycle to be larger than the time scales of the continuous dynamics in order to ensure that

the discrete-time dynamics dominates the behavior of the system. This assumption, which

holds in the normal operating regimes with which this paper is concerned, is implicit in equa-

tion (2). Because instructions and clock cycles are not 1:1, though, even the discrete time

scales require some thought, particularly in view of the embedding theorems. The uniform-

sampling requirements of the Takens-Whitney-Mane framework suggest that one sample the

computer’s state every n clock cycles and use the standard delay-coordinate methods on the

resulting data. Alternatively, one can view instructions as events—“spikes,” in the frame-

work of [2]—and use inter-spike interval (ISI) embedding on the intervals between them to

reconstruct the dynamics. The latter approach actually makes more sense here, since the

completion of an instruction on a modern computer is the result of something very much like

the integrate-and-fire process assumed by the ISI embedding theorems. Sampling according

to instruction, rather than clock cycle, also facilitates meaningful comparison between ma-

chines that use different clock cycle times. For these reasons, all of the experiments in this

paper define the sampling rate in terms of instructions3.

III. DYNAMICS OF A SIMPLE LOOP

The following fragment of C code initializes the elements of the upper triangle of a matrix

to zeroes. It works in column-major order, incrementing a row index j and a column index

i via a nested loop:¨ ¥co l major :

for ( i = 0 ; i < 255 ; i++)

for ( j = i ; j < 255 ; j++)

data [ j ] [ i ] = 0 ;

10

sampling interval (instructions)

cycle

s(x

)/cycle

s(1

e+

05

)

0.9

95

0.9

97

0.9

99

1e+05 120000 150000 250000 450000

(a)

1e−06

1e−02

1e+02

1e+06

0.0 0.1 0.2 0.3 0.4 0.5

1e−06

1e−02

1e+02

1e+06

frequency (1/(100,000 instructions))

po

we

r

(b)

FIG. 1: The effect of sampling rate on the measured dynamics: (a) runtime as a function of

sampling interval, normalized to the runtime of the unsampled runtime (b) power spectra of data

measured at 100,000-cycle (top) and 200,000-cycle (bottom) sampling intervals. The data in part

(b) were normalized in the appropriate manner so as to align the frequency axes.

§ ¦

Though this code is simple, it is interesting for several reasons, including the fact that its

performance dynamics are dominated by how it uses the computer’s memory. Its column-

major access pattern interacts badly with the way in which computers anticipate what data

will be needed next and fetch it into the memory cache. As a result, this code causes a large

number of cache misses, which slows it down.

A. Methods

We compiled this loop on an Intel Core2 processor with the gcc compiler, version 4.1,

optimization level “-O2.” To explore its dynamics, we ran it repeatedly and sampled the

state of the microprocessor every 100,000 instructions. To record each sample, our sampling

tool stopped the program’s execution, took over control of the microprocessor, and recorded

two pieces of information: the total number of clock cycles (CYCLES) that were required to

run those 100,000 instructions, and the number of cycles (CACHE) in that interval during

which the L2 data cache experienced a miss, forcing the processor to fetch data from main

memory. These measures were then stored on disk for later processing and the sampling

tool handed back control of the microprocessor to the col_major program.

A complicating factor in all of the experiments reported here is that the measurement

11

infrastructure is part of the environment that it is measuring. To check whether that is an

issue—that is, whether the measurement process is affecting system dynamics—we varied

the sampling rate and observed the effects on the program’s behavior. In terms of overall

runtime, the impact is minimal; as shown in Figure 1(a), the col_major program runs

only 0.03% faster when the sampling period is doubled from 100,000 to 200,000 cycles.

(Interestingly enough, this behavior appears to follow a scaling law. Scaling laws turn

up in a variety of situations, from earthquakes[3] to star formation[4]; it is unclear what

the underlying mechanism is in this case, but we are in the process of developing new

measurement techniques that will let us find out.) Changing the sampling rate did not affect

the power spectrum of the data, as shown in Figure 1(b), which is stronger evidence that

the sampling process is not changing the dynamics. To ensure that we are not sampling too

seldom—an obvious concern here, since all of these intervals are extremely long in comparison

to the execution time of the col_major loop—we verified that the autocorrelation of the

time-series data reflected significant correlation between successive samples. Taken together,

this set of tests indicates that the measurement methods used here are indeed producing an

accurate picture of the dynamics.

B. Dynamics

The time-series data for the col_major program’s cache behavior (CACHE) is shown

in Figure 2(a). This plot covers the first 100 samples (100 × 100, 000 instructions) of the

program’s evolution on this microprocessor, capturing the number of cycles in each sampling

period during which a cache miss occurred. Note that even though the program evolution

is highly periodic—a simple nested loop—the performance dynamics, as sampled by the

CACHE metric, are not completely periodic, but rather irregular in nature. The CYCLES

measure (not shown) is likewise irregular.

We used delay-coordinate embedding to reconstruct the state-space dynamics from the

time-series data in Figure 2(a). We followed standard procedures to choose appropriate

embedding parameters: the first minimum of the mutual information curve[5] as an esti-

mate of the delay τ and the false-nearest neighbors technique[6], with a threshold of 0.1%,

to estimate the embedding dimension m. The resulting values, obtained using TISEAN’s

mutual and false_nearest tools[7], were τ = 1 (i.e., 100,000 instructions) and m = 12,

12

0 20 40 60 80 100

21

00

02

20

00

23

00

0

time (instructions x 100,000)

CA

CH

E

(a)

20500 21000 21500 22000 22500 23000 23500

20

50

02

15

00

22

50

02

35

00

CACHEt

CA

CH

Et+

1

(b)

FIG. 2: Memory accesses during the execution of col_major on an Intel Core2: (a) time series of

the total number of cycles in each sampling interval during which the L2 cache is busy fetching

data from main memory (b) two-dimensional projection of the data from (a), embedded with τ = 1

and m = 12.

respectively. Figure 2(b) shows a two-dimensional projection of the reconstructed dynamics.

This geometry is robust, remaining unchanged over multiple runs of various lengths from

different initial conditions4. The islands of points are not a strobing artifact; they persist

if the run length and/or sampling interval are changed. Taken together, these facts sug-

gest that the performance dynamics of col_major has an attractor—one with interesting,

low-dimensional geometry.

To explore this conjecture, we calculated four standard invariants from the embedded

data, beginning with the correlation exponent v. This exponent, a measure of the local

structure of the dynamics, has been shown to be a lower bound on attractor dimension[8].

To calculate it, we used TISEAN’s d2 tool, which computes the correlation sum C(m, ε) as a

function of neighborhood size (ε) for many values of m using a box-counting algorithm. If a

scaling region exists in this curve over a range of ε values, the slope in that region corresponds

to v. For the embedded col_major CACHE data, as shown in Figure 3(a), C(m, ε) does

indeed contain a scaling region over the range 280 ≤ ε ≤ 1400 with slope v = 0.86± 0.1. As

a corroboration of both this v value and the correctness of the embedding, we repeated this

calculation for a range of embedding dimensions 1 ≤ m ≤ 25, hence the multiple traces in

the Figure (plotted in black for 15 ≤ m ≤ 25 and grey for m ≤ 15 to show the pattern with

increasing m). The extent of the scaling region—across a range of both m and ε—increases

our confidence in the v calculation; the asymptotic pattern in the geometry of the curves

13

20 50 100 200 500 2000 5000

0.0

05

0.0

20

0.1

00

0.5

00

neighborhood size,ε

Co

rre

latio

n S

um

C(m

,ε)

(a) Correlation Sum

20 50 100 200 500 2000 5000

01

23

45


d ln

C(ε

) /

d ln

ε

(b) Correlation Dimension

5 10 15 20

−7

.8−

7.4

−7

.0−

6.6


S(ε

m,t

)

(c) Largest Lyapunov Exponent

20 50 100 200 500 2000 5000

0.0

0.5

1.0

1.5

2.0


Corr

ela

tion E

ntr

opy h

2(m

)

(d) Correlation Entropy

FIG. 3: Calculations of the dynamical invariants of Figure 2(b). The multiple traces reflect runs

of the associated calculations with different m values for corroboration purposes, as described in

the text.

with changing m increases our confidence in the quality of the embedding. (The salient

features of the curves changed until m ≈ 15, which is higher than the m = 12 estimate from

the false-nearest neighbor method. This is discussed below.)

Next, we calculated the correlation dimension by plotting the local slopes of the correla-

tion sum against the log of the neighborhood size ε. A horizontal region across the middle

range5 of this curve indicates the D2 value. Using this method on the embedded col_major

CACHE data, we obtained the results shown in Figure 3(b). It is notoriously difficult to

judge scaling regions in curves like this, which include bumps and dips because of local

effects of the data and the algorithms. The scaling region in this particular set of curves is

actually much cleaner than in most textbook examples (e.g., [7, pp82–85]), and its geometry

persists over a range of m values (viz., the different curves in the plot). Using the range

200 ≤ ε ≤ 1200, we calculated D2 = 0.83± 0.03. This result is consistent with the known[8]

14

relationship between D2 and v: limε→0 v → D2 This pair of dimensions, then, strongly sug-

gests that the dynamics of col_major on an Intel Core2 has an attractor whose dimension

is around 0.8.

Since the dynamics have a clear pattern but are not periodic, the obvious next question is

whether this system is chaotic. To explore this, we calculated the largest Lyapunov exponent

λ1, a measure of the sensitivity of the dynamics to perturbations, using the algorithm of

Rosenstein et al.[9]. Figure 3(c) plots the average divergence of neighboring points in the re-

constructed col_major CACHE dynamics over a range of different time intervals, computed

using TISEAN’s lyap_r tool. This plot contains a scaling region in the first 1.5 million

instructions of program execution for 14 ≤ m ≤ 15, and then saturates because the spread

between the points tracked by the Rosenstein et al. algorithm reaches the diameter of the

reconstructed attractor. The linear region indicates exponential divergence in the dynamics,

and its slope is λ1 = 0.08± 0.002× 100, 000 instructions. As a check on this result, we also

calculated the correlation entropy (h2), which is known to be an upper bound on the sum of

the positive λi of a system[7]. Figure 3(d) shows these results, obtained using the TISEAN

d2 tool over a range of different 15 ≤ m ≤ 25 and 25 ≤ ε ≤ 180 (curves again plotted

in black for 15 ≤ m ≤ 25 and grey for m ≤ 15 to show the pattern with increasing m).

From these curves, we calculated h2 = 0.15 ± 0.005, which is consistent with our estimate

of λ1 = 0.08. Though neither λ1 estimate is large, this system does indeed appear to be

sensitively dependent on initial conditions—i.e., the runtime of the program will be different

each time it is invoked.

C. Discussion

Nonlinear time-series analysis of the dynamics of any experimental process should of

course be done with care, thought, and a close eye on the conditions of the underlying

mathematics of the analyses. All of the associated calculations are highly sensitive to noise,

data quantity, and the parameters of the methods (e.g., ε), and their results require expert

human interpretation. To truly trust the results of any of these methods, one must vary the

algorithm parameters, seek corroboration with different methods, and consider whether the

data are adequate for the conclusions that one draws.

In our results, the various dimension values corroborate quite nicely, beginning with the

15

topological dimension of the state space. The original theorems require that embedding

dimension m ≥ 2d, where d is the dimension of the underlying dynamics, but tighter bounds

have since been established (e.g., m ≥ 2DA, the box-counting dimension[10]). The false-

nearest neighbor method produced an estimate of m = 12 for the data of Figure 2(a), but

that value is of course sensitive to the cutoff threshold used in the method. The geometry of

the curves in Figure 3 also provides some useful feedback about embedding dimension, via

the “asymptotic invariant” approach: the m value at which they stop changing is another

measure of the dimension of the reconstructed state space. By that measure, Figure 3(a), (b)

and (d) suggest m = 15, 15 and 8, respectively. Taken together, these results suggest that

a successful embedding of these data requires between 12 and 15 dimensions. The attractor

itself occupies only part of this space, of course: 0.86 ± 0.1 according to the correlation

exponent, or 0.83± 0.03 according to the more-accurate correlation dimension.

Dimension is also important here because it dictates how much data one needs for a

successful embedding. Smith proposes the following lower bound on the number of points

needed to properly reconstruct an attractor[11]: 42M , where M is the next-largest integer

above the topological dimension of the attractor. However, Smith’s methods are a topic

of some controversy: other authors (e.g. Tsonis et al.[12]) feel that Smith’s estimates are

overly pessimistic. While the length of the time series used in this section (86138) falls

below Smith’s requirements, it exceeds Tsonis’s requirement (102+0.4×M) by two orders of

magnitude. When we increased the run length, we observed negligible effects upon the

results presented above, which also indicates that our data length is adequate.

This cohort of fairly consistent numbers is a strong indication of low-dimensional

dynamics—both the topological dimension of the reconstructed state space (∼ 12 − 15)

and the part of that space that is occupied by the trajectory (∼ 0.8). This is somewhat sur-

prising in a system as complex as a microprocessor. Modern hardware is composed of many

tightly coupled subsystems, however; the execution unit of the microprocessor proceeds only

when it receives data from the local cache, for instance. While the number of transistors in

the system is extremely large, this coupling—as in other dynamical systems—reduces the

effective dimensionality of the system (cf., millions of planetesimals coalescing into a single

rigid body). We conjecture that the coupling of subsystems in a computer is responsible for

the low-dimensional dynamics observed here.

16

IV. THE IMPACT OF ARCHITECTURE UPON DYNAMICS

A key hypothesis behind the framework for computer dynamics that we propose here is

reflected in the ~Fimpl in equation (2): that different implementations of the same micro-

processor may have different dynamics, even if both implementations adhere to the same

ISA specification. To test this hypothesis, we ran a series of experiments involving the same

code as in the previous section, but on an Intel Pentium 4 instead of an Intel Core2 micro-

processor. These two processors not only share the same ISA, but the same manufacturer;

nonetheless, the dynamics are very different.

A. Methods

We compiled the col_major loop shown in Section IV on a Pentium 4 with the gcc

compiler, version 4.1 at optimization level “-O2” in order to match the methods of the prior

section, then measured the cache-miss behavior (CACHE) every 100,000 instructions. As in

the prior section, we also captured machine-cycle data (IPC), and we used the same runtime,

frequency-spectrum & correlation methods to verify that the observed behavior was not a

function of the sampling frequency.

B. Dynamics

The time-series data for the CACHE metric of col_major on the Pentium 4 is shown in

Figure 4(a). A close visual comparison of this plot and Figure 2(a) makes it clear that ~Fimpl

matters: the dynamics on the Pentium 4 are periodic, rather than chaotic, even though the

two processors—both of which are x86 machines—are running the same code. Moreover, the

Pentium 4 col_major dynamics are not robust: roughly 70% of the time, the performance

evolution of the program is periodic, as in Figure 4(a); the other 30% of the time, the system

dynamics look like noise (or extremely high-dimensional dynamics). This variability speaks

to the sensitivity of the system to hidden parameters. In an effort to isolate which ones,

we repeated the experiment multiple times, rebooting the computer between runs. The

fact that the dynamics were so variable despite this procedure—which should, according

to hardware design principles, reset the system to the same internal state—indicates that

there is a bifurcation parameter somewhere in the (proprietary) implementation logic that

17

0 20 40 60 80 100

05

00

10

00

15

00


CA

CH

E

(a)

0 500 1000 1500

05

00

10

00

15

00

CACHEt

CA

CH

Et+

1

(b)

FIG. 4: Memory accesses during the execution of col_major on an Intel Pentium 4: (a) time series

of the number of L2 cache misses per sampling interval (b) two-dimensional projection of the data

from (a), embedded with τ = 1 and m = 12.

is not adhering to that design principle. One likely suspect in this case is the “trace cache”

mentioned on page 7: a structure that the chip’s designers believed would streamline the

execution process, but that actually injected some apparent non-determinism into the per-

formance dynamics. The power of the dynamical-systems approach taken in this paper is

not only a useful corroboration of the anecdotal reports about this behavior in the computer

performance literature, but may provide a way to isolate and understand its causes.

The rest of this section describes the periodic behavior that dominates6 the dynamics

of this hardware/software combination, and compares it to the non-periodic dynamics of

the Core2. We began by using delay-coordinate embedding to reconstruct the Pentium 4

dynamics from Figure 4(a), following the same approach as in Section III. The embedding

parameters given by the average mutual information and the false nearest neighbor methods,

respectively, were τ = 1 and m = 12—the same values as in the previous section, as discussed

below. A two-dimensional projection of the embedded attractor is shown in Figure 4(b).

To compare the dynamics of col_major on the two different processors, we calculated

recurrence plots of the embedded trajectories in Figures 2 (b) and 4(b). The results, shown

in Figure 5, clearly bring out the differences between the two systems. A recurrence plot[13]

is a two-dimensional representation of the correlations in a trajectory. A point (i, j) on this

plot is colored black if the distance between the ith and jth points in the time series is less

than some threshold δ. We use δ = 3.9 × 100, 000 instructions. We choose δ by balancing

two constraints: too small of a value produces too many false positives—points that are said

18

0 20000 40000 60000 80000

02

00

00

40

00

06

00

00

80

00

0


tim

e (

instr

uctio

ns x

10

0,0

00

)

(a)

0 20000 40000 60000

02

00

00

40

00

06

00

00


tim

e (

instr

uctio

ns x

10

0,0

00

)

(b)

FIG. 5: Recurrence plots of the col_major dynamics on different processors: (a) Core2 and (b)

Pentium 4.

to be “close” when in reality they are not—while too large a value produces too few points

to make out any pattern. Recurrence plots bring out the dynamical differences very clearly

here. Figure 5(a) shows the recurrence plot of the embedded trajectory from col_major

run on the Intel Core2 processor; part (b) is a recurrence plot of the same program on the

Intel Pentium 4. In these plots, a periodic signal manifests as a set of diagonal lines, and

Figure 5(b) is a classic example of this. The noisy, somewhat-banded structure of Figure 5(a)

is clearly very different.

C. Discussion

These results make it clear that the dynamics of a program’s evolution depend both upon

the code that is running (~Fcode) and the hardware upon which it is run ( ~Fimpl). The same

piece of code causes periodic cache-miss behavior on one Intel microprocessor and aperiodic—

probably chaotic—behavior on another. Note that the τ and m values for the Pentium 4

embedding are identical to those for the Core2 embedding, suggesting that the topological

dimension of the two systems’ state spaces are similar, even though the dynamics of their

trajectories is different. This point is particularly interesting in view of the enormity of the

potential state space and the differences that we have noted about these two processors. The

similarity in our estimates of the state-space dimension is likely because the dynamics are

19

dominated by the memory subsystem of the microprocessor. The difference in the geometry

of the trajectories is not clear to us at this point.

The dynamical disparity between machines has important implications for the micro-

processor community. Many of the design optimizations used by processor architects rest

upon simple assumptions about software and broad generalizations about its interactions

with the hardware. The results presented in this Section suggest that these simplifications

are not always valid, and they explain why they can lead to counter-intuitive results (viz.,

the trace cache’s detrimental effects upon performance). The fact that all of this dynami-

cal richness can be exposed by a program as simple as col_major is quite compelling, but

real computer programs are far more complex and their dynamics could be dominated by

different effects. The following section explores these issues.

V. SUPERPOSITION DYNAMICS OF COMPUTER PROGRAMS

The simple loop used in the previous sections is what the architecture community calls a

“micro-kernel:” a simple piece of code whose dynamics should be easy to understand. Micro-

kernels feature prominently in the standard approaches to performance analysis, which break

programs down into pieces, analyze each piece, and then compose the results. In a linear

world, this kind of reductionist analysis can be useful, but superposition does not necessarily

work in nonlinear dynamical systems like computers.

To explore this issue, and to take a step towards dynamical analysis of more-complex

programs, we performed an experiment that combined the loop used in Sections III and IV

and another simple loop that accesses the processor’s memory in a different pattern.

We ran this experiment on the microprocessor used in Section IV: an Intel Core2.

The code involved two simple loops: the col_major loop on page 10 and another matrix-

initialization loop, shown below:¨ ¥row major :

for ( i = 0 ; i < 255 ; i++)

for ( j = i ; j < 255 ; j++)

data [ i ] [ j ] = 0 ;§ ¦

This loop switches the data access pattern of the previous loop: from column-major order to

20

0 20 40 60 80 100

12

00

14

00

16

00


CA

CH

E

(a)

1000 1200 1400 1600 1800

10

00

12

00

14

00

16

00

18

00

CACHEt

CA

CH

Et+

1

(b)

FIG. 6: Memory accesses during the execution of row_major on an Intel Core2: (a) time series of

the total number of cycles in each sampling interval during which the L2 cache is busy fetching

data from main memory (b) two-dimensional projection of the data from (a), embedded with τ = 1

and m = 12.

row-major. Because this access pattern works much better with the design of the computer

memory, this loop runs an order of magnitude faster than the column-major loop in the

previous sections.

We first evaluated the stand-alone dynamics of this loop, then constructed a program

that alternated between both loops, running each one 20,000 times and then repeating. We

compiled all of this code using the same compiler and options as in both previous examples.

As before, we measured cycle and cache behavior every 100,000 instructions and checked

the correlations and frequency spectra at higher and lower sampling rates to ensure that the

sampling methodology was not affecting the data.

Figure 6 shows the time-series data for the cache behavior of the row_major loop, when

run alone, as well as a two-dimensional projection of the corresponding embedded trajectory,

The embedding parameters, obtained with the same standard methods described in previous

Sections, were τ = 1 and m = 12—values identical to those obtained in all of our previous

experiments. The geometry of the reconstructed dynamics has some distinct patterns. It

resembles Figure 2(b)—col_major on the same processor—but is more continuous, lacking

the islands of points, and it contains several “ghost” copies of the main triangular struc-

ture. These ghosts are not transients or artifacts; they persist and fill in if we lengthen the

run. Like the dynamics of Section III—and unlike the corresponding run of this code on a

Pentium 4—these dynamics are robust, repeatable over a large number of runs.

21

10 20 50 100 200 500

5e

−0

45

e−

03

5e

−0

25

e−

01


Co

rre

latio

n S

um

C(m

,ε)

(a) Correlation Sum

10 20 50 100 200 500

01

23

45


d ln

C(ε

) /

d ln

ε

(b) Correlation Dimension

5 10 15 20

−6.4

−6.2

−6.0

−5.8

−5.6


S(ε

m,t

)

(c) Largest Lyapunov Exponent

10 20 50 100 200 500

0.0

0.5

1.0

1.5

2.0

2.5

3.0


Co

rre

latio

n E

ntr

op

y h

2(m

)

(d) Correlation Entropy

FIG. 7: Calculations of the dynamical invariants of Figure 6(b). The multiple traces reflect runs

of the associated calculations with different m values for corroboration purposes, as described in

the text.

To study the dynamics of this hardware/software system and compare it to the previous

experiments, we again calculated a few standard invariants, beginning with the correlation

exponent. Figure 7(a) shows the results; a fit to the scaling region of this family of curves

(25 ≤ ε ≤ 210) gives v = 1.14± 0.02. To calculate the correlation dimension, we fit a line to

the scaling region 25 ≤ ε ≤ 180 in the curves of Figure 7(b), obtaining D2 = 1.16 ± 0.009.

In both calculations, we again varied the embedding dimension from 6 ≤ m ≤ 25 as a

check on both the associated result and the quality of the embedding. As in the col_major

dynamics on this same processor, these two dimension estimates are consistent with the

v ≤ D2 bound of [8]. The row_major dynamics has a slightly higher dimension, though:

∼ 1.1, compared to col_major’s ∼ 0.8, reflecting the volume of state space occupied by

the attractor. As before, we calculated the largest Lyapunov exponent, λ1, from the slope

of the scaling region in Figure 7(c), obtaining 0.06 ± 0.001. The correlation entropy h2

22

calculated over 20 ≤ ε ≤ 200 of Figure 7(d), was h2 = 0.05± 0.057 These two numbers are,

as in our previous Core2 experiments, consistent with the bounds proposed in [7]. Both are

positive, suggesting that the performance of row_major, like that of col_major, is sensitively

dependent on initial conditions, but the values (∼ 0.05) are really too low to draw any firm

conclusions about chaos.

These features and measures—robust, non-periodic dynamics with a positive Lyapunov

exponent and fractal state-space structure—suggest that the performance dynamics of

row_major running on a Core2 microprocessor is—like those of col_major on that same

processor—chaotic. The attractor geometry is different, as is visible even in the two-

dimensional projections of Figures 6(b) and 2(b) (e.g., the “islands” in the former). This

is reflected in the v and D2 results, which are somewhat higher for row_major. All of this

makes good sense from a computer-performance standpoint, as well. The col_major code is

“memory bound,” in the terminology of the field, so its dynamics are completely dominated

by that one subsystem of the computer. row_major is not memory bound, which means that

other subsystems also play important roles in its dynamics. Thus, a more-complex attractor

and a higher estimate for its dimension appeals to our intuition as computer architects, fur-

ther demonstrating the effectiveness of the dynamical systems approach and the framework

of equation (2) for problems in this field.

Real-world computer programs, of course, are far more complex than the simple loops

studied here, and so it is unclear whether these results will be useful in practice. Our final

experiment addressed this question, exploring the composition dynamics of the two loops

by running them in an interleaved fashion (20,000 iterations each). Figure 8(a) shows a

segment of the corresponding memory-access time series. The alternating nature of the

time series reflects the different memory-use patterns of the two loops: as described above,

col_major uses the computer’s memory much less efficiently, causing more cache misses,

while row_major’s average miss rate is much lower.

The standard procedures for calculating the embedding parameters do not work on this

data. The dynamical time scales are dominated by when we switch between row_major and

col_major, rather than by the dynamics of the either piece of code (or their combination).

As a consequence, the first minimum of the mutual information curve (τ) is at almost half

the length of the time series. Any embedding with this kind of τ value is suspect because

of the Smith/Tsonis limits described on page 16, so one cannot reliably calculate dynamical

23

0 20000 40000 60000 80000

50

00

15

00

0


CA

CH

E

(a)

0 5000 10000 15000 20000 25000

05

00

01

50

00

25

00

0

CACHEt

CA

CH

Et+

1

(b)

FIG. 8: Memory accesses during the execution of an interleaved row_major/col_major loop on an

Intel Core2: (a) time series of the total number of cycles in each sampling interval during which the

L2 cache is busy fetching data from main memory (b) two-dimensional projection of the data from

(a), embedded with τ = 1 and m = 12. Note that this is not a topologically faithful embedding,

as described in the text.

invariants from it. We could of course generate a longer time series: say, several thousand

iterations of the row_major/col_major sequence rather than the five shown in Figure 8(a).

But the dynamical invariants of that time series would capture the switching behavior, not

the dynamics of the two regimes.

From a dynamics standpoint, it is quite clear what is going on here: Figure 8(a) reflects

a system under the influence of a periodic series of externally forced bifurcations. Every

20,000 instructions, ~Fcode changes, which changes the dynamics. One would expect this

change to cause the trajectory geometry to switch between Figure 6(b) and Figure 2(b),

and that appears to be exactly what happens. This is evident in a correlation plot of the

data, as shown in Figure 8(b). Here, we plot the CACHE(t) metric from Figure 8(a) against

CACHE(t+τ) for τ = 1—the same τ as in the other embeddings in this paper—for the

entire (800,000 samples) data set. Note that this is not a faithful reconstruction, nor does it

bear any topological similarity to the true dynamics; it is presented here simply for visual

comparison. Despite its formal limitations, the geometry of this plot is interesting. The

small triangle at the top right, which corresponds to the part of the trace where col_major is

running, is identical to Figure 2(b). A magnified view of the blob of points at the bottom left

of Figure 8(b), which represent the row_major part of the superposition experiment, closely

resembles Figure 6(b). The correlation plot includes a few stray points at the transitions—

24

points that have some coordinates in row_major and others in col_major (or in the transient

regime where the processor is switching between them).

This interpretation has important implications for real-world computer programs, which

are made up of a large number of loops, interleaved in complicated ways. Among other

things, the results presented above suggest that one should not blindly drop time-series data

from a computer into nonlinear time-series analysis tools. A computer program is a time-

varying ~Fcode in the dynamics of equation (2), and the time period between the bifurcations

in ~Fcode may or may not be long enough to allow transients to die out. A trace from a

running program, then, is a mix of transient and asymptotic dynamics from a variety of

different ~Fcodes. It makes no sense to calculate any dynamical “invariant” of such a time

series.

In spite of this, dynamical systems tools can still be useful in real-world computer appli-

cations. Many standard programs are dominated by a single loop, for instance. The popular

bzip2 compression tool spends almost 50% of its time in a loop that consists of only six

instructions, making it amenable to the kind of analysis used here. Other popular bench-

marks, like vpr, move around through different loops8. It is not surprising, then, that prior

work has found a strong indication of chaotic dynamics in bzip2 but inconclusive results for

vpr[14].

Our superposition experiment not only explains this, but suggests some broader uses of

dynamical systems methods in computer performance analysis. Instead of attempting to

understand the dynamics of an entire program, it may be useful to extract the key segments

and analyze them in isolation, then build up from that understanding. The dynamical

switches explored in Figure 8 tie into another important task in the computer systems field.

Analysts think about computer programs as divided into phases—linear sequences of code

with no branch points—but they have no good way to find those phases automatically. Our

results suggest that a bifurcation analysis could be useful in doing so, though it would of

course be limited by the transient/attractor issues mentioned two paragraphs previously.

Another important application for dynamical systems techniques in computer analysis &

design is verification. Validation of a simulator, for instance, is critical to its utility, but

computer architects usually do not consider the time-varying behavior of a machine and

simulator, respectively, when “validating” their simulators. Instead they rely on aggregate

metrics, such as the total number of cache misses over the entire program run, as a means

25

of validation[15]. Our preliminary experiments[16] show that programs run very differently

on simulators than on real hardware. (The experiments in [14] involved a simulator, inci-

dentally, not real computer hardware.) Given the framework of equation (2), this is easy

to understand: the simulator is not replicating ~Fimpl—not surprising, given how delicate

the results presented here suggest that task to be. Dynamical systems techniques could

arm architects with powerful tools to make effective comparisons of simulators and of real

computers.

VI. CONCLUSION

In this paper, we have proposed a nonlinear dynamics-based framework for modeling and

analyzing computer systems. This framework lets us use standard methods from the dy-

namics literature to understand the behavior of these engineered systems. Using a custom

measurement infrastructure and delay-coordinate embedding, we found strong indications,

from multiple corroborating methods, of low-dimensional chaotic dynamics in the perfor-

mance of a simple program running on a popular Intel processor. The nature of the dy-

namics changed completely when the same program was run on a different Intel processor,

even though its design adhered to the same specification, affirming the role of ~Fimpl in the

dynamics. Changing the program also changed the dynamics, verifying the presence of ~Fcode

in the model of equation (2). When the two programs were interleaved in time, the dynamics

alternated accordingly, leading us to a view of a computer as a dynamical system system

under the influence of a periodic series of externally forced bifurcations.

While these results validate the general form of equation (2), what we would really like to

do, of course, is reverse engineer the actual form of the dynamics: that is, the functions ~Fcode

and ~Fimpl and the way in which they compose. Deducing the system derivative from a time

series is one of the hard, open problems in nonlinear system identification, and nonlinear

dynamics only offers a partial solution (viz., a reconstruction that has the same topology as

the true dynamics). Even though this partial solution does not help us deduce the precise

form of the underlying dynamics or the nature of its state variables, it is still quite useful.

Not only does it facilitate understanding of the nature of the dynamics and the interacting

roles of hardware and software, as described in the previous paragraph. The results in this

paper also make it very clear that computers are complex nonlinear systems, which defies

26

the traditional analysis techniques used in that field. All of this raises fundamental questions

about the match between different computers, both real and simulated. Computer architects

use simple, linear and aggregate measures to “validate” a simulator against a real machine,

for instance, and then rely on that simulator to predict how new design features would affect

the behavior of that real machine. In view of the complex nonlinear interactions that give

rise to the dynamics that we report in this paper, that approach is clearly flawed. The ideas

and methods developed by the nonlinear dynamics community are, in our opinion, a much

better way to study, understand, and design modern computer systems.

[1] J. R. Bulpin and I. A. Pratt, Workshop on Duplicating, Deconstructing, and Debunking

(WDDD04) (2004).

[2] T. Sauer, Chaos 5, 127 (1995).

[3] K. Christensen, L. Danon, T. Scanlon, and P. Bak, PNAS 99, 2509 (2002).

[4] N. Bastian, B. Ercolano, M. Gieles, E. Rosolowski, R. Scheepmaker, R. Gutermuth, and

Y. Efremov, Mon. Not. R. Astron. Soc. 379, 1302 (2007).

[5] A. M. Fraser and H. L. Swinney, Phys. Rev. A 33, 1134 (1986).

[6] M. B. Kennel, R. Brown, and H. D. I. Abarbanel, Phys. Rev. A 45, 3403 (1992).

[7] R. Hegger, H. Kantz, and T. Schreiber, Chaos 9, 413 (1999).

[8] P. Grassberger and I. Procaccia, Physica D 9, 189 (1983).

[9] M. T. Rosenstein, J. J. Collins, and C. J. D. Luca, Physica D 65, 117 (1993).

[10] T. Sauer, J. A. Yorke, and M. Casdagli, Journal of Statistical Physics 65, 579 (1991).

[11] L. Smith, Phys. Lett. A 133, 283 (1988).

[12] A. Tsonis, J. Elsner, and K. Georgakakos, Journal of the Atmospheric Sciences (1993).

[13] S. O. K. J.P. ECKMANN and D. RUELLE, Europhys. Letters (1987).

[14] H. Berry, D. G. Perez, and O. Temam, Chaos 16, 013110 (pages 15) (2006), URL

http://link.aip.org/link/?CHA/16/013110/1.

[15] D. A. Penry, M. Vachharajani, and D. I. August, Proceedings of the Workshop on Modeling,

Benchmarking, and Simulation (MoBS) (2005).

[16] T. Mytkowicz, E. Bradley, and A. Diwan, Tech. Rep., University of Colorado at Boulder

(2007).

27

Notes

1ASPLOS, ISCA and MICRO

2Since these 32-byte variables are used as addresses.

3We also tried it the other way and got identical results for the invariants, though the geometry of the

attractor was of course different.

4Each program run affects the contents of the state variables ~x, leaving a “footprint” in the computer’s

memory structures.

5At small ε, noise affects the associated calculations; for large ε, the finite size of the embedded trajectory

destroys the scaling.

6The behavior that appears in the other 30% of the runs obviously defies analysis by the tools of low-

dimensional dynamics.

7We did this calculation for a range of m (8 ≤ m ≤ 25); the mean was 0.05 and the 99% confidence

interval of the mean was ±0.05.

8vpr is a CAD program that does placement & routing for integrated-circuit chips.

28

sampling interval (instructions)

cycl

es(x

)/cy

cles

(1e+

05)

0.99

50.

997

0.99

9

1e+05 120000 150000 250000 450000

1e−06

1e−02

1e+02

1e+06

0.0 0.1 0.2 0.3 0.4 0.5

1e−06

1e−02

1e+02

1e+06

frequency (1/(100,000 instructions))

pow

er

0 20 40 60 80 100

2100

022

000

2300

0


CA

CH

E

20500 21000 21500 22000 22500 23000 23500

2050

021

500

2250

023

500

CACHEt

CA

CH

Et+

1

2050

100200

5002000

5000

0.005 0.020 0.100 0.500


Correlation Sum C(m,ε)

20 50 100 200 500 2000 5000

01

23

45


d ln

C(ε

) / d

ln ε

5 10 15 20

−7.

8−

7.4

−7.

0−

6.6


S(ε

m,t)

20 50 100 200 500 2000 5000

0.0

0.5

1.0

1.5

2.0


Cor

rela

tion

Ent

ropy

h2(

m)

0 20 40 60 80 100

050

010

0015

00


CA

CH

E

0500

10001500

0 500 1000 1500

CA

CH

Et

CACHEt+1

020000

4000060000

80000

0 20000 40000 60000 80000



020000

4000060000

0 20000 40000 60000



0 20 40 60 80 100

1200

1400

1600


CA

CH

E

10001200

14001600

1800

1000 1200 1400 1600 1800

CA

CH

Et

CACHEt+1

1020

50100

200500

5e−04 5e−03 5e−02 5e−01


Correlation Sum C(m,ε)

10 20 50 100 200 500

01

23

45


d ln

C(ε

) / d

ln ε

5 10 15 20

−6.

4−

6.2

−6.

0−

5.8

−5.

6


S(ε

m,t)

1020

50100

200500

0.0 0.5 1.0 1.5 2.0 2.5 3.0


Correlation Entropy h2(m)

0 20000 40000 60000 80000

5000

1500

0


CA

CH

E

0 5000 10000 15000 20000 25000

050

0015

000

2500

0

CACHEt

CA

CH

Et+

1

computer systems are dynamical systemslizb/papers/todd-chaos08.pdf · computer systems are...

Documents