can you trust it?

4
8 C O M P L E X I T Y John Casti received his Ph.D. in math- ematics under Richard Bellman at the University of Southern California in 1970. He worked at the RAND Corpora- tion in Santa Monica, CA, and served on the faculties of the University of Arizona, NYU, and Princeton before becoming one of the first members of the research staff at the International Institute for Applied Systems Analysis (IIASA) in Vienna, Austria. In 1986 he left IIASA to take up his current post as a Professor of Operations Research and System Theory at the Technical University of Vienna. He is also a resident member of the Santa Fe Institute in Santa Fe, New Mexico, USA. In addition to numerous technical articles and a number of research monographs, Professor Casti is the author of several volumes of trade science, including Paradigms Lost, Searching for Certainty, Complexifi- cation, and Would-Be Worlds. BY JOHN L. CASTI Can You Trust It? On the Reliability of Computer Simulation and the Validity of Models © 1997 John Wiley & Sons, Inc., Vol. 2, No. 5 CCC 1076-2787/97/05008-04 THE WORLD OF SIMULACRA I n January 1995, the Las Vegas oddsmakers established the San Francisco 49ers as 19-point favorites over the San Diego Chargers in Super Bowl XXIX. When I saw this point spread, I simply didn’t believe it. No team, I thought, could reach the pin- nacle event of the National Football League season and be so superior to its opponent. So to get some statistical insight into whether this seemingly outrageous point spread made any sense, I fired up a rather detailed simulation program for playing American professional football games using the playing characteristics of the actual NFL players, and set one of my computers working for a week or so playing and replaying this game 100 times on the electronic gridiron. In this run of 100 electronic copies of Super Bowl XXIX, it turned out that the 49ers were indeed the superior club, winning 54 of the contests by an average margin of vic- tory of just under seven points. But not once in these 100 games did the 49ers exceed the 19-point margin of victory set by the Nevada bookies. Armed with this fact, I confi- dently placed my loyalties and my money on the Chargers and sat back to enjoy the game. As everyone knows by now, the 49ers won the actual contest by a count of 49 to 26. So much for simulated worlds—not to mention my investment! The point of this sad little story is not to lament my loss, which, in fact, was re- couped the next year in Super Bowl XXX between Dallas and Pittsburgh using the re- sults of a similar computational experiment. Rather, the take-home message here is that there is an eternally unbridgable gap between the real world of the NFL and the “would-be” world of what’s going on inside my computer. But in a real world in which computer models are used to supply input to decisions in which vast sums of money, not to mention millions of lives, are at stake, can we really trust these simulations far enough to put these dollars and lives on the line? And why do we have to put our trust in these models, at all? These are two of the questions that will shape much of the sci- ence of the 21st century. “HYPOTHETICALITY” Some years back, Professor Wolf Häfele, former director of the German Nuclear Re- search Center in Juelich, coined the term “hypotheticality” to refer to situations in which we have to make life-or-death decisions but are unable to perform the experiments or tests needed to gain much-needed information about the situation before having to make the decision. At the time, Häfele’s concern was mostly with the possible dan- gers—real and fanciful—seen by environmentalists and other concerned citizens wor- rying about the widespread use of nuclear power. But the term has much broader cur- rency than merely as a way of describing the fact that there is no way to confirm or deny a particular claim about hypothetical dangers of nuclear reactors. It can refer

Upload: john-l-casti

Post on 06-Jun-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Can you trust it?

8 C O M P L E X I T Y © 1997 John Wiley & Sons, Inc.

John Casti received his Ph.D. in math-ematics under Richard Bellman at the

University of Southern California in1970. He worked at the RAND Corpora-

tion in Santa Monica, CA, and served onthe faculties of the University of Arizona,

NYU, and Princeton before becomingone of the first members of the research

staff at the International Institute forApplied Systems Analysis (IIASA) in

Vienna, Austria. In 1986 he left IIASA totake up his current post as a Professor ofOperations Research and System Theoryat the Technical University of Vienna. Heis also a resident member of the Santa Fe

Institute in Santa Fe, New Mexico, USA.In addition to numerous technical

articles and a number of researchmonographs, Professor Casti is theauthor of several volumes of trade

science, including Paradigms Lost,Searching for Certainty, Complexifi-

cation, and Would-Be Worlds.

BY JOHN L. CASTI

Can You Trust It?On the Reliability of Computer Simulation and the Validity of Models

© 1997 John Wiley & Sons, Inc., Vol. 2, No. 5CCC 1076-2787/97/05008-04

THE WORLD OF SIMULACRA

I n January 1995, the Las Vegas oddsmakers established the San Francisco 49ers as19-point favorites over the San Diego Chargers in Super Bowl XXIX. When I sawthis point spread, I simply didn’t believe it. No team, I thought, could reach the pin-

nacle event of the National Football League season and be so superior to its opponent.So to get some statistical insight into whether this seemingly outrageous point spreadmade any sense, I fired up a rather detailed simulation program for playing Americanprofessional football games using the playing characteristics of the actual NFL players,and set one of my computers working for a week or so playing and replaying this game100 times on the electronic gridiron.

In this run of 100 electronic copies of Super Bowl XXIX, it turned out that the 49erswere indeed the superior club, winning 54 of the contests by an average margin of vic-tory of just under seven points. But not once in these 100 games did the 49ers exceedthe 19-point margin of victory set by the Nevada bookies. Armed with this fact, I confi-dently placed my loyalties and my money on the Chargers and sat back to enjoy thegame. As everyone knows by now, the 49ers won the actual contest by a count of 49 to26. So much for simulated worlds—not to mention my investment!

The point of this sad little story is not to lament my loss, which, in fact, was re-couped the next year in Super Bowl XXX between Dallas and Pittsburgh using the re-sults of a similar computational experiment. Rather, the take-home message here isthat there is an eternally unbridgable gap between the real world of the NFL and the“would-be” world of what’s going on inside my computer. But in a real world in whichcomputer models are used to supply input to decisions in which vast sums of money,not to mention millions of lives, are at stake, can we really trust these simulations farenough to put these dollars and lives on the line? And why do we have to put our trustin these models, at all? These are two of the questions that will shape much of the sci-ence of the 21st century.

“HYPOTHETICALITY”Some years back, Professor Wolf Häfele, former director of the German Nuclear Re-search Center in Juelich, coined the term “hypotheticality” to refer to situations in whichwe have to make life-or-death decisions but are unable to perform the experiments ortests needed to gain much-needed information about the situation before having tomake the decision. At the time, Häfele’s concern was mostly with the possible dan-gers—real and fanciful—seen by environmentalists and other concerned citizens wor-rying about the widespread use of nuclear power. But the term has much broader cur-rency than merely as a way of describing the fact that there is no way to confirm ordeny a particular claim about hypothetical dangers of nuclear reactors. It can refer

Page 2: Can you trust it?

C O M P L E X I T Y 9© 1997 John Wiley & Sons, Inc.

equally well to almost any assertion onecares to make about the behavior of whatwe now call a complex, adaptive system.Such questions arise everywhere, fromthe effect of carbon dioxide in the atmo-sphere on overall global warming, to thepossible danger of genetically engineeredplants, to government policies, and to thepossible benefits of various vaccines pro-posed for the AIDS virus.

What all these examples have incommon—and the list could begreatly extended in almost any

direction one cares to look—is that we arefaced with having to make critical choicesabout a system whose workings are al-most a total mystery to us. Each of theprocesses sketched above is an exampleof a complex, adaptive system consistingof a large number of individual agents—investors, virus molecules, genes—thatcan change their behavior based on in-formation they receive about what theother agents in the system are doing.Moreover, the interaction of these agentsthen produces patterns of behavior forthe overall system that cannot be under-stood or even predicted based on knowl-edge about the individuals alone. Rather,these emergent patterns are a joint prop-erty of the agents and their interactions—both with each other and with their am-bient environment. The ability of suchsystems to resist analysis by the tradi-tional reductionistic tools of science hasgiven rise to what is now called the sci-ences of complexity, involving the searchfor new theoretical frameworks andmethodological tools for understandingthese systems. And probably the largestcannon in this arsenal is simulatedworlds constructed inside the digitalcomputer. But to use this weapon effec-tively, we have to understand the differ-ences between a computer model and thereal thing—and, most importantly, whenthat difference matters.

REPRESENTATIONS AND REALITYSuppose we want to inform ourselvesabout the looks of Gertrude Stein. Willwe learn more from Picasso’s famous

portrait of her or from a photograph?Which will be the better representation,the portrait or the photograph? Picasso’sportrait gets us closer to one of the mostcharacteristic features of a good model:Such a model captures the essence of itssubject. Roughly speaking, this meansthat the symbols and objects of themodel are sufficiently rich to allow us toexpress the questions we want to askabout the slice of reality the model rep-resents and, furthermore, that the modelprovides answers to these questions. Re-ferring again to Picasso’s portrait ofGertrude Stein, if our question wasabout the color of her hair and Picassohad used green paint to portray it, then

we would conclude that the portrait wasnot a good model at all—at least not forproviding an answer to this particularquestion. On the other hand, if our in-terest were in knowing more aboutGertrude Stein’s personality and innersoul, and if Picasso’s portrait displayedthese aspects of her in a particularlytransparent manner, then the portraitwould be thought of as a very goodmodel indeed of its subject—even if hehad painted Stein with green hair. So thefirst and foremost test that a good modelmust pass is that it provide convincinganswers to the questions we put to it.However, this is a kind of metacriterionthat covers several more specific crite-ria for good modeling.

Although answering questions is thesine qua non of a good model, there areother characteristics that tend to separate

the good from the bad. These includesimplicity, clarity, bias-free, and tractabil-ity. When we consider various computermodels of the real world, we are continu-ally forced to address questions relatingto how the model teaches us somethingabout real things. Sometimes, the modelwill simply try to duplicate as closely aspossible the real-world phenomena, aswith the football simulation of the SuperBowl. In other cases, the model will be adeliberate caricature of the real system,specifically designed to exaggerate someaspect of the system at the expense of fi-delity in other areas. Such an exaggera-tion aims at drawing our attention to aparticular feature of the real-world sys-tem in much the same way that politicalcartoonists used to draw distorted ver-sions of Richard Nixon’s ski-jump nose asa quintessential feature of his physiog-nomy. But rather than continuing tospeak in such vague generalities, let menow give some specific examples of simu-lations, the questions we can ask of them,and the ways we can assess the answers.

THE COMPUTER AS A CALCULATOREverybody talks about the weather, andsome folks can actually predict it—sort of.Figure 1 shows the predicted and ob-served record of rainfall and temperaturefrom two computer weather-forecastingmodels over a five-year period in theAmazonian regions. The computer mod-els used to make these predictions werethose of the Canadian Climate Center(CCC) and the Goddard Institute forSpace Studies (GISS). The regions coveredby the models are shown in part (b) of thefigure, while their projections for the cli-matic variables appear in part (a), alongwith the actual observations taken fromdata in the World Weather Record (WWR).The CCC and GISS predictions were com-puted using models with somewhat dif-ferent grid spacings: nine 5-degree-by-5-degree squares in the CCC model, andfour 8-degree-by-10-degree cells in theGISS model. Both simulations were runover a 20-year period, by which point thesteady-state behavior had been reached.As a result, only the last five years of the

Professor Wolf Häfele coinedthe term “hypotheticality” torefer to situations in which

we have to make life-or-deathdecisions but are unable toperform the experiments ortests needed to gain much-needed information about

the situation before havingto make the decision.

Page 3: Can you trust it?

10 C O M P L E X I T Y © 1997 John Wiley & Sons, Inc.

simulated climatic trajectories are shownin the figure. What can we make of theseresults?

The seasonality of the rainfall recordis stronger in both simulations than in theobserved record. While the observationsdo show a strong seasonality, it is out ofphase between the northern and south-ern Amazon. Thus, the fact that bothmodels show the seasonality of the south-ern region indicates that the mathemati-cal rain belts may be displaced northwardfrom their real-world counterparts, espe-cially in the CCC model. Although thesefactors do not necessarily prove much,particularly since the WWR data arethemselves far from “clean,” the results doillustrate some of the principal difficul-ties involved in assessing climatic mod-els. To quote Ann Henderson-Sellers, alively player in the climatology game, “Inspecific locations for which observationaldata can be ascertained, slight shifts inthe model circulation combine with thecoarse [grid] resolution to make valida-tion very difficult.”

The tone of this somewhat inconclu-sive comment seems to character-ize the current state of much of

long-term climate modeling. While onecan pose rather specific questions tothese sorts of simulations, evaluation ofthe outputs is fraught with various inter-pretive pitfalls, due to the different as-sumptions about the physics of the atmo-sphere built into the models, vagaries ofavailable data, and the accuracy of the an-swers needed to make policy decisions onthings like CFC emissions. Now let’s lookat a quite different type of simulation, butone that, like the football simulation dis-cussed above, is also a high-resolutionpicture of a real-world situation—an ur-ban road-traffic network.

SIMULATED ROAD TRAFFICA few years ago, the U.S. EnvironmentalProtection Agency set forth regulations(the Clean Air Act of 1990) specifying en-vironmental impact standards for justabout any change that anyone mightwant to make to anything that involvesthe air, earth, fire, and water constitut-ing the human habitat. In particular,these standards apply to proposed

modifications to road-traffic systems,changes such as the construction ofhigh-speed rail links, the addition of abridge, or the construction of a freeway.Unfortunately, there is no known way ofactually assessing whether any proposed

change of this sort actually meets thestandards laid down by the Clean Air Actof 1990. In 1991, Los Alamos NationalLaboratory researcher Chris Barrett hadthe bright idea that computing technol-ogy had finally reached a level at which

FIGURE 1

Predicted and Actual Rainfall and Temperature Records

Page 4: Can you trust it?

C O M P L E X I T Y 11© 1997 John Wiley & Sons, Inc.

it should actually be feasible to build anelectronic counterpart of a city like Al-buquerque, New Mexico, complete withevery single street, house, car, and trav-eler. Barrett thought that with such asurrogate version of the city inside hiscomputer, it would then be possible tocouple this silicon city to an air-pollu-tion model so as to actually calculate di-rectly the environmental impact of anyproposed change to the road-traffic sys-tem. Happily, some visionary thinkersat the Federal Highway Administrationof the U.S. Department of Transporta-tion agreed with Barrett and providedthe financial support needed to turn thisfantasy into a reality.

A t first glance, one might think thateven with the state-of-the-artcomputing capacity available at a

place like Los Alamos, it would be an im-possible task to create an “Albuquerquia”of 200,000 or so households and more than400,000 daily travelers moving about on30,000 road segments. Not only does thisrepresent an imposing database manipu-lation problem, but also there is the taskof planning travel routes and keeping trackof each of these thousands of travelers ev-ery second or so as they make their waythrough the network. But Chris Barrett isnot the type of man to be discouraged bysuch minor obstacles, and, contrary to allexpectations, he and his group succeededin creating just such a would-be world. Itis called TRANSIMS. Basically, it involvescreating an electronic copy of the entireroad-traffic network of Albuquerque in-side a computer in order to study the cre-ation and destruction of traffic patterns,congestion, and the like, over a 24-hourperiod.

As travelers are dropped into the net-work and begin making their way fromone place to another, TRANSIMS enablesus to take a god-like view of the system,zooming in wherever we wish to look atlocal traffic behaviors. These results sug-gest innumerable what-if games that canbe played with this sort of tool. Just a fewsamples include:

• What effect would a new bridge acrossthe Rio Grande have on rush-hourtraffic?

• How would the traffic density onMenaul Boulevard (one of the princi-pal east-west surface streets) changeif the major east-west thoroughfareswere changed to one-way flow, east towest, from 6A.M. to 9A.M.?

• What if traffic were metered enteringthe two interstate highways that bisectthe city?

• How do the traffic-light patterns onCentral Avenue (another major east-west artery) affect rush-hour densitieson the bridges?

And so on and so forth. The list ofquestions that one can envision is end-less—and not new. But with a laboratorylike TRANSIMS at our disposal, for per-haps the first time ever, we do not haveto build expensive bridges or do possiblydangerous tinkering with the system inorder to answer them. It’s important toemphasize, however, that evaluating howmuch faith to put in the answers providedby the simulation still ultimately rests oninformed, human judgment. Basically, allTRANSIMS can do is provide data and runexperiments, which then must be inter-preted and assessed by road-traffic engi-neers, urban planners, politicians, andother humans familiar with the actual Al-buquerque traffic and socioeconomicsystems.

All the simulations discussed thusfar—NFL football, climate, road-traffic—have as their goal to build as faithful anelectronic representation as possible oftheir real-world correlate system. But notall computer simulations have such fidel-ity as their aim; many deliberately give up

detail for generality, which thus changesnot only the nature of the questions wecan put to such models, but also how weevaluate the answers.

MODEL VALIDATIONThe literature on model validation is bynow vast. While much of this work hasbeen focused on various types of math-ematical or verbal models, computersimulations cannot escape the samescrutiny. Let me list just a few of the cri-teria that have been put forth as axesalong which to evaluate the validity ofmodels.

• Operational: Is the model able to pro-vide answers to the questions forwhich it has been built?

• Empirical: Does the model agree withobserved data that are relevant to theproblem under consideration?

• Theoretical: Does the model contra-dict any established theories?

• Consistency: Does the model containany logical contradictions?

• Faith: Do specialists in the area beingmodeled agree that the model pro-duces believable results?

• Testing: Can the model be tested in thereal world?

So here we have but a few of themore important ways that amodel—computer or otherwise—

can be evaluated. But what we learn fromall the modeling exercises discussed ear-lier is that the question of how much trustwe can place in computer cum math-ematical models ultimately tends tocome down to two principal issues: whatquestion we want the model to answer,and how accurate that answer must be.Models and simulations are constructedwith different levels of fidelity and withdifferent purposes in mind. So just like acarpenter building a house, we have topick the right tool for the job. Moreover,once the job is completed, we have toturn to expert opinion about the systemunder study to ultimately decide whetherthe answers provided by the model aresatisfactory. So the final resolution of boththese issues ultimately rests on experthuman judgment; in short, mathematicsplus computing does not equal magic.

What we learn from all themodeling exercises discussedearlier is that the question ofhow much trust we can placein computer cum mathematical

models ultimately tends tocome down to two principal

issues: what question we wantthe model to answer, and how

accurate that answer must be.