rsta 363 1835.qxp 9/20/05 6:53 pm page 3

134

Upload: others

Post on 10-Dec-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

RSTA_363_1835.qxp 9/20/05 6:53 PM Page 3

Front cover photograph courtesy of NASA, capturing a view of Hurricane Bonnie 500 miles from Bermuda in September 1992. Reproduced withpermission of Earth 2000 Ltd, PO Box 37, Bognor Regis, West Sussex, UK.

Inset picture: Representation of the 400-year old Kepler Conjecture, which asserts that no packing of congruent spheres has a densitygreater than that of the commercially familiar face-centred cubic packing.

Editorial Coordinator: Cathy Brennan

(tel: +44 (0)20 7451 2633; fax: +44 (0)20 7976 1837;

[email protected]).

Journal Production Manager: Matthew LlewellinProduction Editor: Iain Finlayson

6–9 Carlton House Terrace, London SW1Y 5AG, UK

Scope. Phil Trans A concentrates on invited papers, in the form of

issues on Themes and Discussions, concerning any aspect of the

physical sciences and engineering, including mathematics and Earth

sciences. Readers are welcome to propose Themes for consideration

by the editorial board of the journal. For information and details on

paper preparation, please consult the Instructions to Authors

(see www.journals.royalsoc.ac.uk).

Typeset in Europe by the Alden Group, Oxford. Printed by the University Press, Cambridge

EditorProfessor J. M. T. Thompson FRS

Editorial CoordinatorCathy Brennan

J. M. T. Thompson, Editor, Department of Applied Mathematics and Theoretical Physics,Centre for Mathematical Sciences, University of CambridgeA. J. Coates, Mullard Space Science Laboratory,University College LondonA. G. Davies, School of Electronic and ElectricalEngineering, University of LeedsP. J. Dornan, Department of Physics, Imperial CollegeF. J. Dyson, School of Natural Sciences, Institute forAdvanced Study, PrincetonR. S. Ellis, Astronomy, California Institute ofTechnologyP. Kohl, University Laboratory of Physiology,University of OxfordJ. Howard, Department of Chemistry, University of Durham

J. C. R. Hunt, Department of Space and ClimatePhysics, University College LondonJ. E. Marsden, Control and Dynamical Systems,California Institute of TechnologyA. J. Meadows, Department of Information Science,Loughborough UniversityF. C. Moon, Sibley School of Mechanical Engineering,Cornell UniversityG. Stepan, Department of Applied Mechanics,Budapest University of Technology and EconomicsI. N. Stewart, Department of Mathematics, University of WarwickM. Tabor, Program in Applied Mathematics,University of ArizonaJ. F. Toland, Department of Mathematical Sciences,University of BathH. Zhou, Department of Mechanics, Tianjin University

Editorial Board

SUBSCRIPTIONS

Phil Trans A (ISSN 1364-503X) is published monthly. Full

details of subscriptions may be obtained on request

from the Subscriptions Sales Office 6–9 Carlton House

Terrace, London SW1Y 5AG (tel. +44 (0)20 7451 2646;

fax +44 (0)20 7976 1837; [email protected]). The

Royal Society is Registered Charity No 207043.

Subscription prices2006 calendar year

Printed versionplus electronic

access

Europe USA & CanadaAll othercountries

£1174

US$2167

£1238

US$2288

£1269

US$2346

The Royal Society is an independent academy promoting thenatural and applied sciences. Founded in 1660, the Societyhas three roles, as the UK academy of science, as a learnedSociety, and as a funding agency. It responds to individualdemand with selection by merit, not by field. The Society’sobjectives are to:

• strengthen UK science by providing support to excellent individuals

• fund excellent research to push back the frontiersof knowledge

• attract and retain the best scientists

• ensure the UK engages with the best sciencearound the world

• support science communication and education;and communicate and encourage dialogue withthe public

• provide the best independent advice nationally andinternationally

• promote scholarship and encourage research into the history of science

For further information on the Society’s activities, pleasecontact the following departments on the extensions listed bydialling +44 (0)20 7839 5561, or visit the Society’s Web site(www.royalsoc.ac.uk).

Research Support (UK grants and fellowships)Research appointments: 2547Research grants: 2539Conference grants: 2540

Science AdviceGeneral enquiries: 2585

Science CommunicationGeneral enquiries: 2572

International Exchanges (for grants enabling researchvisits between the UK and most other countries (exceptthe USA))General enquiries: 2550

Library and Information ServicesLibrary/archive enquiries: 2606

volume 461

number 2062

8 October 2005

COPYRIGHT © 2005 The Royal Society

Except as otherwise permitted under the Copyright, Designs and Patents Act, 1988, this publication may only be reproduced, stored or

transmitted, in any form or by any means, with the prior permission in writing of the publisher, or, in the case of reprographic reproduction,

in accordance with the terms of a licence issued by the Copyright Licensing Agency. In particular, the Society permits the making of a single

photocopy of an article from this issue (under Sections 29 and 38 of the Act) for an individual for the purposes of research or private study.

Hyperasymptotics for nonlinear ODEs. II. The first Painlevé equation and a second-order Riccati equationA. B. Olde Daalhuis

Quantum fluid mechanical and electronic structure of a hydrogen-like atomH. H. Chiu

How turbulence enhances coalescence of settling particles with applications to rain in cloudsS. Ghosh, J. Dávila, J. C. R. Hunt, A. Srdic, H. J. S. Fernando & P. R. Jonas

Cluster formation in complex multi-scale systemsJ. D. Gibbon & E. S. Titi

Wave packet pseudomodes of variable coefficient differential operatorsL. N. Trefethen

Eddy current coil interaction with a right-angled conductive wedgeT. P. Theodoulidis & J. R. Bowler

Asymptotic distribution method for structural reliability analysis in high dimensionsS. Adhikari

Bounds for some non-standard problems in porous flow and viscous Green–Naghdi fluidsR. Quintanilla & B. Straughan

Re-entrant corner flows of UCM fluids: the initial formation of lip vorticesJ. D. Evans

Dynamic portfolio selection with nonlinear transaction costsT. Chellathurai & T. Draviam

Integrable discrete differential geometry of ‘plated’ membranes in equilibriumW. K. Schief

A Borg–Levinson theorem for treesB. M. Brown & R. Weikard

Geometry of Calugareanu’s theoremM. R. Dennis & J. H. Hannay

On the granular lubrication theoryJ. Y. Jang & M. M. Khonsari

Modulations on turbulent characteristics by dispersed particles in gas–solid jetsK. Luo, J. Fan & K. Cen

Computational design of recovery experiments for ductile metalsN. K. Bourne & G. T. Gray III

A new calculation of the work of formation of bubbles and dropsJ. Lewins

Eshelby formalism for nano-inhomogeneitiesH. L. Duan, J. Wang, Z. P. Huang & B. L. Karihaloo

RSTA_363_1835.qxp 9/20/05 6:53 PM Page 2

Preface

Mathematical proof is one of the highest intellectual achievements of humankind.It contains the deepest, most complex and most rigorous arguments of which weare capable.

Until the last half century, mathematical proof was the exclusive preserve ofhuman mathematicians. However, following the logical formalisation of proof andthe invention of electronic computers, it has become possible to automate theprocess of proof. Initially, automatic theorem-proving computer programs wereonly capable of proving trivial theorems. But with the exponentially increasingspeed and storage capacity of computers, and the development of moresophisticated theorem-proving software, it has now become possible to proveopen conjectures by mechanical means.

These developments have raised questions about the nature of mathemat-ical proof. Some have argued that mathematical proof is an essentially socialprocess in which humans interact and convince each other of the correctnessof their arguments. Not only are computer ‘proofs’ hard for humans tounderstand but computers are unable to take part in this social process, so itis argued that whatever theorem-proving computers do, it is not reallymathematics. Some proofs, such as the Four Colour Theorem and Kepler’sConjecture, have required essential computer assistance to check a largenumber of cases. Because this computer processing is inaccessible to humanmathematicians, many of them have refused to accept these part-mechanicalproofs.

On the other hand, computer scientists routinely use mechanical proof for theformal verification of the correctness of their computer programs. They arguethat these verification proofs are so long and complicated, and humans so errorprone, that only a completely computer-checked proof merits the level ofconfidence required for a safety- or security-critical application. Also, somemathematicians have found computer systems to be a useful experimental tool,which can do not just numeric calculations, but also symbolic, algebraicmanipulation and graphical display of the results. Such tools can be highlysuggestive, e.g. of new conjectures or approaches. This has generated fierceargument as to the role of experimentation within mathematics.

In October 2004, a group of mathematicians, computer scientists, logicians,sociologists, philosophers and others from many diverse disciplines came togetherat the Royal Society for a two-day debate on the issues listed above—and manyrelated issues. It was a very well attended meeting and generated a lively andconstructive debate. This journal records the debate. It contains not just thepapers of the many prestigious speakers, but also a record of the discussionsfollowing the talks. Also presented are three position statements from the panel

Phil. Trans. R. Soc. A (2005) 363, 2331–2333

doi:10.1098/rsta.2005.1660

Published online 12 September 2005

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof’.

2331 q 2005 The Royal Society

discussion ‘Formal versus rigorous proof for verification’. It provides a wealth ofopposing viewpoints and insightful observations into the nature of mathematicalproof.

In ‘Computing and the cultures of proving’, Donald MacKenzie outlines a‘Sociology of mathematics’ by surveying the concepts of proof held by differentcommunities of mathematicians and computer scientists. In particular, hecontrasts preferences for mechanised versus non-mechanised proof and formalversus rigorous proofs. Henk Barendregt and Freek Wiedijk develop thedreams of those building automatic, logic-based theorem provers. In ‘Thechallenge of computer mathematics’ they argue for the inevitability of acollaborative approach to mathematics between humans and machines. In‘What is a proof ?’, Alan Bundy and colleagues try to find a middle way—anaccount of proof that while logic-based and automatable, more closelyemulates rigorous, human-constructed proofs than does the traditional, formal,logical account. Our panel discussion on ‘Formal versus rigorous proof forverification’ contrasts different approaches to proof even within the ComputerScience community, whose interests are primarily on applications of proof tothe verification that complex computer systems meet their specifications.Ursula Martin gives an overview of this research area and its history. Fromthe viewpoint of an industrial user of computer proof, Roderick Chapmandiscusses some of the pragmatic issues that need to be addressed for it to beused routinely by software engineers. Cliff Jones argues for the use of rigorous,as opposed to formal, proof in verification.

In ‘Highly complex proofs and implications of such proofs’, MichaelAschbacher estimates that the classification of finite simple groups is tens ofthousands of pages long and is certain to contain errors. He argues that therewill be other highly useful theorems without short elegant proofs and discusseshow mathematics must evolve to address the issues this raises. Paul Cohenrelates the history of the logical formalisation of mathematics, highlighting thecontributions of Frege, Hilbert, Goedel and Skolem. In ‘Skolem and pessimismabout proof in mathematics’, he argues that the Skolem–Lowenheim Theoremdealt a body-blow to Hilbert’s Programme and that a vast majority ofcomplex conjectures are beyond the reach of reasoning. He then discusses thepractical consequences of these observations for mathematics. AngusMacIntyre reflects on the interaction between mathematics, logic andcomputer-assisted theorem proving in his paper ‘The mathematical signifi-cance of proof theory’, but he detects a gulf of understanding between thedifferent communities. E. Brian Davies defends ‘Pluralism in mathematics’, i.e.the view that classical mathematics, constructive mathematics, computer-assisted mathematics and various forms of finitistic mathematics can co-exist.Having been at various times in his life, a pure mathematician, an appliedmathematician and a computer scientist, Peter Swinnerton-Dyer also arguesfor a pluralistic attitude to proof and rigour. In ‘The justification ofmathematical statements’ he argues that different standards of rigour areappropriate depending on the importance, the unexpectedness, the beauty andthe application of the resulting theorem.

As both MacKenzie and MacIntyre observed, our meeting on the Nature ofMathematical Proof revealed a number of different cultures with many differentviews of what constitutes a proof. Despite these culture clashes, most

Preface2332

Phil. Trans. R. Soc. A (2005)

participants enjoyed an enormously stimulating and fruitful discussion, with lotsof avenues for further interaction and research. We hope that this will be just thebeginning of an exciting multi-disciplinary exploration into the nature ofmathematical proof.

A. BundyUniversity of Edinburgh

Preface 2333

Phil. Trans. R. Soc. A (2005)

Computing and the cultures of proving

BY DONALD MACKENZIE

School of Social & Political Studies, University of Edinburgh,Adam Ferguson Building, Edinburgh EH8 9LL, Scotland

([email protected])

This article discusses the relationship between mathematical proof and the digitalcomputer from the viewpoint of the ‘sociology of proof ’: that is, an understanding ofwhat kinds of procedures and arguments count for whom, under what circumstances, asproofs. After describing briefly the first instance of litigation focusing on the nature ofmathematical proof, the article describes a variety of ‘cultures of proving’ that aredistinguished by whether the proofs they conduct and prefer are (i) mechanized or non-mechanized and (ii) formal proofs or ‘rigorous arguments’. Although these ‘cultures’mostly coexist peacefully, the occasional attacks from within one on another are ofinterest in respect to what they reveal about presuppositions and preferences. A varietyof factors underpinning the diverse cultures of proving are discussed.

Keywords: mathematical proof; sociology of proof; cultures of proving;computer-system verification; formal verification

1. Introduction

The relationship between mathematical proof and the digital computer is atthe heart of a number of major scientific and technological activities: see figure 1.Proofs are conducted about computers in at least three areas: some of thosesoftware systems upon which human lives depend; key aspects of somemicroprocessors and some systems upon which national security depends.These ‘formal verifications’, as proofs about the ‘correctness’ (correspondence tospecification) of the design of computer hardware or software are called, arethemselves normally conducted using computer programs; either automatedtheorem provers (software especially designed to produce proofs, albeit oftenwith human guidance) or model checkers (which check whether a representationof a system is a ‘model’ for the logical formula expressing the system’sspecification, in other words an interpretation of the logic in which the formula istrue). Mathematicians themselves have also turned to the computer forassistance in proofs of great complication, the most famous such cases beingthe four-colour theorem and Kepler sphere-packing conjecture. Automatedtheorem provers are also of considerable interest and importance within artificialintelligence. They raise, for example, the question of the extent to which acomputer can replicate the thought processes of human mathematicians.

Phil. Trans. R. Soc. A (2005) 363, 2335–2350

doi:10.1098/rsta.2005.1649

Published online 9 September 2005

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof’.

2335 q 2005 The Royal Society

The resultant issues are deep, and have played out in historical episodes thatI have discussed elsewhere (MacKenzie 2001). In this paper, I focus on the mostintriguing question this area throws up for the sociology of science: is it possibleto develop a sociology of mathematical proof, in other words a sociologicalanalysis of what kinds of procedures and arguments count for whom, under whatcircumstances, as proofs?

The question is of interest sociologically because of a certain imbalance in thesociological analysis of science, which has focused predominantly on the naturalsciences. Relatively little work has been done on the ‘deductive’ sciences ofmathematics and logic, and sociological discussion of deductive proof—which isat the core of those disciplines—is sparse (see MacKenzie 2001 for references tothe main existing work).

Perhaps, the reader may suspect, there has been little ‘sociology of proof ’because such sociology is impossible? Perhaps mathematical proof is an absolutematter, not subject to the kind of variation that would make it susceptible tosociological analysis. The history of mathematics, however, reveals substantialvariation in the kinds of argument that have been taken as constitutingmathematical proof (see the survey by Kleiner 1991). Eighteenth century work incalculus, for example, often relied upon manipulating infinitesimally smallquantities or infinite series in ways that became unacceptable in the nineteenthcentury. Early twentieth century mathematics was driven by dispute over the

mathematical proofs:

- about computers

- using computers

computersystems onwhich lives

depend

key aspectsof (some)

microprocessors

artificialintelligence: can a

computer be an‘artificial

mathematician’?

computersystems on

which nationalsecurity depends

mathematicalproofs ofimmense

complication ofdetail

automatedtheorem provers

and modelcheckers

Figure 1. Proof and the computer.

D. MacKenzie2336

Phil. Trans. R. Soc. A (2005)

acceptability, in proofs involving infinite sets, of the law of the excluded middle.(The law is that for any proposition p, ‘p or not-p’ must be true. Invocation ofexcluded middle permits non-constructive existence proofs, which demonstratethat a mathematical entity exists by showing that its non-existence would implya contradiction.)

The history of mathematics, therefore, suggests that mathematical proof is notthe straightforward, absolute matter than it is often taken to be. In 1987,colleagues and I drew upon this evidence to make a prediction about the effort(by then widely pursued) to apply mathematical proof to computer systems. Wenoted that this effort involved moving mathematical proof into a commercial andregulatory arena. We speculated that the pressures of that arena would forcepotential variation in the meaning of proof out into the open, but that disputesabout proof would no longer simply be academic controversies. We suggestedthat it might not be long before a court of law had to rule on what amathematical proof is (Pelaez et al. 1987).

That prediction was nearly borne out in 1991, when litigation broke out inBritain over the application of mathematical proof to a microprocessor chipcalled VIPER (verifiable integrated processor for enhanced reliability), whichhad been developed by computer scientists working for the Ministry of Defence’sRoyal Signals and Radar Establishment. At stake was whether the chain ofmathematical reasoning connecting the detailed design of VIPER to itsspecification was strong enough and complete enough to be deemed a proof.Some members of the computer-system verification community denied that itwas (Cohn 1989; Brock & Hunt 1990), and, largely for unconnected reasons, salesof VIPER were disappointing. Charter Technologies Ltd, a firm which hadlicensed aspects of VIPER technology from the Ministry of Defence, took legalaction against the Ministry, alleging, amongst other things, that VIPER’s designhad not been proven to be a correct implementation of its specification.

No ‘bug’ had been found in the VIPER chips; indeed, their design had beensubjected to an unprecedented amount of testing, simulation, checking andmathematical analysis. At issue was whether or not this process, as it stoodimmediately prior to the litigation (considerable subsequent work was done onthe VIPER verification), amounted to a mathematical proof. Matters of factabout what had or had not been done were not central; the key questions thathad been raised by critics were about the status, adequacy and completeness,from the viewpoint of mathematical proof, of particular kinds of argument. Withthe Ministry of Defence vigorously contesting Charter’s allegations, the casefailed to come to court only because Charter became bankrupt before the HighCourt heard it. Had it come to court, it is hard to see how the issue of what, inthis context, mathematical proof consists in could have been avoided.

The VIPER controversy has been reported elsewhere (MacKenzie 1991), and asingle episode has inevitable idiosyncrasies. Let me turn, therefore, to widerissues of the ‘sociology of proof ’ raised by the domains listed in figure 1.

(a ) Mechanized and non-mechanized proofs; formal and rigorous proofs

How might one characterize variations in the kinds of mathematicalprocedures or arguments that are taken, in the fields listed in figure 1, asconstituting proofs? One dimension of that variation is very simple: it is whether

2337Cultures of proving

Phil. Trans. R. Soc. A (2005)

a procedure is conducted, or an argument generated, by a human being or by amachine. Controversy in respect to this dimension has focused above all onmachine-performed procedures that are too extensive or too complicated forunaided human beings to check. The dependence of the proof of the four-colourtheorem on such procedures led to much debate as to whether it was indeed agenuine proof (see MacKenzie 2001). However, the traditional mathematicalpreference for proofs performed by human beings (or at least surveyable byhuman beings) is contested when it comes to verification of the design ofcomputer systems. The proponents of automated verification have argued thatmechanization is preferable to reasoning by unaided human beings, prone as weare to lapses of concentration and wishful thinking. One strand of the criticism ofthe claim of proof for VIPER, for example, was that key steps in the chain ofargument had not been subjected to mechanical checking.

A second dimension of variation is also familiar: it is whether an argument is a‘formal proof ’ or a ‘rigorous argument’. A formal proof is a finite sequence of‘well-formed’ (that is, to put it loosely, syntactically correct) formulae leading tothe theorem, in which each formula either is an axiom of the formal system beingused or is derived from previous formulae by application of the system’s rules oflogical inference. These rules will normally be syntactic in form, such as thefamous modus ponens: if p and ‘p implies q ’ are formulae in the sequence, then qcan be added to the sequence. The steps in a formal proof are thus mechanicalapplications of inference rules, and their correctness can therefore be checkedwithout understanding the meaning of the formulae involved.

Rigorous arguments, in contrast, are those arguments that are accepted bymathematicians (or other relevant specialists) as constituting mathematicalproofs, but that are not formal proofs in the above sense. The proofs of ordinaryEuclidean geometry, for example, are rigorous arguments, not formal proofs:even if they involve deducing a theorem from axioms (and some involvereasoning that it is not, at least directly, of this form), the steps in the deductionare typically not merely applications of rules of logical inference. This is notsimply a reflection of the antiquity of Euclidean geometry: articles in modernmathematics journals, whatever their subjects, almost never contain formalproofs. A very simple sketch of a ‘rigorous argument’ proof is provided by thefamous ‘mutilated chessboard’ puzzle; see figure 2. The argument in the captionleaves one in no doubt of the correctness of the conclusion, yet it is not a formalproof (and it would not become a formal proof, in the sense in which the term isused in this paper, even if the everyday terms used in the puzzle were replaced bymore precise mathematical equivalents).

‘Rigorous arguments’ are often called ‘informal proofs’. I avoid the latter term,however, because informal proof is often assumed to be inferior to formal proof,while preferences between the two are amongst the issues that a sociology ofproof needs to investigate. Instead, I draw the notion of ‘rigorous argument’, inthe sense in which the phrase is used here, from a UK Ministry of DefenceProcurement Executive Standard governing safety-critical software (Ministry ofDefence 1991). What fascinates me as a sociologist of science about the formalverification of computer systems is that a document such as a defenceprocurement standard is driven onto the quintessentially philosophical terrainof having to define what ‘proof ’ is!

D. MacKenzie2338

Phil. Trans. R. Soc. A (2005)

2. Cultures of proving

The two dimensions of mechanized versus non-mechanized proofs, and formalproofs versus rigorous arguments, allow a simple map of ‘cultures of proving’(a term I draw from Livingston 1999, though he discusses only non-mechanizedrigorous arguments). In figure 3 a variety of disciplines or more specializedcultures are located according to the forms of proof they practice or value.Mainstream automated theorem proving (discussed in detail in MacKenzie 2001)values, above all, mechanized formal proofs. Some departures from formality arefound in practice—many automated theorem-proving systems employ ‘decisionprocedures’ (algorithms that determine whether formulae in particularmathematical domains are theorems) that do not generate formal proofs—butsuch departures are regarded at best as pragmatic necessities and at worst asreasons for condemnation. One theorem-proving specialist interviewed for theresearch on which this paper is based said that using unverified decisionprocedures was ‘like selling your soul to the Devil—you get this enormous power,but what have you lost? You have lost proof, in some sense’. (For details of theinterviews drawn on here, see MacKenzie 2001.)

In contrast, in ‘ordinary’ mathematics (the mathematics conducted by mostmembers of university mathematics departments, for example) proof is usuallynon-mechanized rigorous argument. Computers are, of course, playing an

× 31

Figure 2. The mutilated chessboard (from Black 1946). Two diagonally opposite corner squares areexcised from a chessboard. Can the remaining 62 squares be covered entirely by 31 dominoes, eachof which can cover two squares (and no more than two squares)? The answer is ‘no’. Anunmutilated chessboard has an equal number of squares (32) of each colour. The two excisedsquares must be the same colour, so the mutilated chessboard has two squares more of one colourthan of the other. Whenever we lay a domino, it covers one square of each colour. If we can cover60 squares by laying 30 dominoes, the last two uncovered squares must be the same colour, and the31st domino therefore cannot cover them.

2339Cultures of proving

Phil. Trans. R. Soc. A (2005)

ever-increasing role in mathematical research but, as noted above, there remainsa pervasive sense that, in regard to proof, computerized procedures that humanbeings cannot check in detail are inferior to arguments that mathematicians cangrasp in their entirety. Arguably, the key issue is not mechanization per se butunsurveyability: human-generated proofs that are too extensive realistically tobe grasped in full are also seen as problematic.

Some computer-system verifications also fall within the quadrant of non-mechanized rigorous argument. Perhaps the key site of such verifications wasIBM’s ‘Cleanroom’, an approach to the development of high-dependabilitysoftware inspired by Harlan D. Mills (for details; see MacKenzie 2001). In theCleanroom, proof was an explicitly human and intersubjective activity, arefinement of the familiar process of software review. A Cleanroom proof was anargument that convinced another human being: specifically, a designer’s orprogrammer’s argument that convinced his or her fellow team members or otherreviewers of the correctness of his or her design or program, for example byconvincing them that account had been taken of all possible cases and that theprogram or design would behave correctly in each case. The claim, ‘It is obvious’,counted as a proof, if to the reviewing team what was claimed was indeed self-evident. Even the use of mathematical notation was not an essential part ofproof, except, in the words of Mills and colleagues, ‘in its effect on the person whois the experimental subject’ (Linger et al 1979). That Cleanroom proof explicitlyaimed to produce ‘subjective conviction’ was no argument against it, said itsproponents, because that was ‘the only type of reasoned conviction possible’(Linger et al 1979). Mistakes were always possible, but systematic reasoningmade human beings less error-prone, and systematic review by other peoplereduced mistakes even further.

Of course, human beings can perform formal proofs as well as rigorous-argument proofs. Prior to the digital computer, formal proofs had to beconducted by hand, as in the ‘logicist’ approach to the foundations ofmathematics exemplified most famously by Whitehead & Russell (1910–1913).

formal proof rigorousargument

mechanized mainstreamautomatedtheoremproving

‘Hard’ artificialintelligence

notmechanized

earlylogicism;Dijkstra’scalculationalproofs

ordinarymathematics

IBM ‘Cleanroom’

Figure 3. Cultures of proving.

D. MacKenzie2340

Phil. Trans. R. Soc. A (2005)

What is more surprising is that modern computer science contains a subculturewithin which formal proof is preferred to rigorous argument but in which proofsare conducted by hand rather than by machine. This culture of ‘calculationalproof ’ was inspired by the great Dutch theoretical computer scientist Edsger W.Dijkstra and, like early logicism, it occupies the lower left quadrant of figure 3.Dijkstra believed that mathematics, including the mathematics of computerscience, should be performed formally, that is, ‘by manipulating uninterpretedformulae accordingly to explicitly stated rules’ (Dijkstra & Scholten 1990). To dootherwise was, in Dijkstra’s view, to be ‘medieval’ (Dijkstra 1988). Yet Dijkstrawas ‘not an enthusiast for the mechanization’ of proof, commenting: ‘Whydelegate to a machine what is so much fun to do yourself?’ (MacKenzie 2001).

There is, indeed, a significant (and to the outsider, a surprising) current ofambivalence about mechanization in the culture of elite, theoretical computerscience. The iconic representation of this ambivalence is in the matter of writing.Within this strand of computer science, the fountain pen, to others an archaictechnology became something of an icon. Dijkstra’s beautifully handwrittenlecture notes and correspondence have become famous. One of Dijkstra’sstudents even hand-wrote, and published in handwritten form, his PhD thesis(van de Snepscheut 1985), to which Dijkstra contributed a handwrittenforeword.

The final, upper right quadrant in figure 3 is mechanized rigorous argument.Formal proof has been relatively easy to automate. The application of rules ofinference to formulae considered simply as strings of symbols can beimplemented on a digital computer using syntactic pattern matching. Theautomation of generic rigorous argument, on the other hand, has been a far moredifficult problem. Some parts of what human mathematicians do, such asalgebraic manipulation, can relatively readily be mechanized: there are nowwidely used commercial programs that automate symbol manipulation in fieldslike algebra and calculus. There are, however, as yet no ‘artificial mathemati-cians’, in the sense of automated systems capable of handling the full spectrum ofrigorous arguments used in different fields of mathematics. Development in thisdirection is a hard problem in artificial intelligence, one that some commentators(such as Penrose 1989) deny will ever be solved fully: for progress in this field,see Bundy et al. (2005).

3. Conflicts over ‘proof ’

Nearly all the time, the cultures of proving listed in figure 3 have coexistedpeacefully alongside each other, often not interacting much. The institutionallocations of the two most populous quadrants, mechanized formal proof and non-mechanized rigorous argument (in its ‘ordinary mathematics’ form) havediffered, with the former based in university computer science departments,quasi-university research institutes (such as SRI International) and relatedindustrial sectors. Only a small minority of academic mathematicians (that is,members of university mathematics departments) seem to use automatedtheorem provers, and almost none have contributed to the development of suchsystems (see MacKenzie 2001).

2341Cultures of proving

Phil. Trans. R. Soc. A (2005)

Certainly, the different forms of proving need not be taken to be in conflict. Itis standard, for example, to take the view that a rigorous-argument proof is asketch that can be translated into a formal proof. Nevertheless, a subtly differentinterpretation is possible, and was put forward by the mathematical logician JonBarwise in a comment on the dispute begun by philosopher James Fetzer’scritique of program verification (Fetzer 1988). Barwise noted that Fetzer, as an‘orthodox’ philosopher, and his critics, as proponents of automated theoremproving, took the canonical notion of proof to be formal proof. In this tacitagreement, said Barwise, both sides ‘stumble over a landmine left by theretreating formalists’; both believed that a ‘real proof ’ was a formal one. ‘[A]t therisk of stepping on the toes of my fellow mathematical logicians’, Barwise arguedthat it was not. Formal proof was only a model of real proof, indeed a ‘severelyimpoverished’ model:

[T]here are many perfectly good proofs that are not modeled in any directway by a formal proof in any current deductive system. For example,consider proofs where one establishes one of several cases and then observesthat the others follow by symmetry considerations. This is a perfectly valid(and ubiquitous) form of mathematical reasoning, but I know of no systemof formal deduction that admits of such a general rule. They can’t, becauseit is not, in general, something one can determine from local, syntacticfeatures of a proof. [I]t could be that the best proofs (in the sense of beingmost enlightening or easiest to understand) of a program’s correctness willuse methods, like symmetry considerations, that are not adequatelymodeled in the logician’s notion of formal proof, and so which would notbe deemed correct by some automated proof checker designed around theformalist’s model (Barwise 1989, p. 849).

Differences of opinion over whether formal proof or rigorous argumentconstitutes ‘real proof ’ indicates the potential for disagreement between culturesof proving. In practice, however, attacks from one culture of proving uponanother are quite rare. They are interesting nevertheless, in what they revealabout the presuppositions of the culture from which the attack comes, ratherthan about the reality of the culture being attacked.

Statements of preference for rigorous-argument over formal proof, or viceversa, usually stop short of denying that the less favoured alternative is proofand of asserting that the corpus of knowledge verifiable only by its application istherefore defective and its contents are not theorems. Nevertheless, that denialcan be found. In a famous attack upon program verification, DeMillo et al (1979)argued, in effect, that because program verifications were formal proofs, andbecause what mathematicians did was rigorous argument not formal proof,program verifications were therefore not proofs and their results not theorems. In1957, the logician Peter Nidditch came close to the opposite claim, when heargued that what mathematicians do is not fully valid proof:

In the whole literature of mathematics there is not a single valid proof in thelogical sense. The number of original books or papers on mathematics inthe course of the last 300 years is of the order of 106; in these, the number ofeven close approximations to really valid proofs is of the order of 101. Inthe relatively few places where a mathematician has seriously tried to give a

D. MacKenzie2342

Phil. Trans. R. Soc. A (2005)

valid proof, he has always overlooked at least some of the rules of inferenceand logical theorems of which he has made use and to which he has made noexplicit reference. In addition, in these places, the mathematician hasfailed to pay sufficient critical attention to purely mathematical points ofdetail (Nidditch 1957, pp. v, 1, 6).

Nidditch’s comment was a critique of ordinary rigorous-argument mathemat-ical proof from the viewpoint of logicism. Dijkstra offered a similar critique fromthe viewpoint of ‘calculational proof ’. One of his famous, widely circulated,handwritten memoranda, dating from 1988, was caustically entitled ‘Realmathematicians don’t prove’. For Dijkstra, the struggle in computing betweenformalists and ‘real programmers’ (who, as Dijkstra put it, ‘don’t reason abouttheir programs, for reasoning isn’t macho’) was part of a wider battle pervading‘the rest of mathematics’ between formalists and ‘informalists’:

only they don’t call themselves by that negative name: presumably theypresent themselves as ‘the real mathematicians’—who constantly interprettheir formulae and ‘reason’ in terms of the model underlying thatinterpretation.

By rejecting formalism, with its clear distinction between ‘provability’ and‘the fuzzy metaphysical notion of “truth’’, mathematics remained according toDijkstra ‘still a discipline with a sizeable pre-scientific component, in which thespirit of the Middle Ages is allowed to linger on’. Amongst its ‘medievalcharacteristics’ was that ‘how to do mathematics is not taught explicitly but onlyby osmosis, as in the tradition of the guilds’ (Dijkstra 1988).

Dijkstra’s ‘calculational’ proofs were themselves attacked by PanagiotisManolios and J Strother Moore, a leading member of the automated theoremproving community, on the grounds that such proofs were not ‘formal’, but‘rigorous arguments in a strict format. where the notion of proof is “convincingenough for your fellow mathematicians”’ (Manolios & Moore 2001). Two pointsabout this critique are of interest. First, proof as conducted in ordinarymathematics is here taken as inferior to formal proof, and not as establishingwhat real proof is (as DeMillo, Lipton and Perlis had argued). Second, Manoliosand Moore’s argument can be taken as suggesting that the non-mechanized,formal quadrant of figure 3 is unstable: that in the absence of the ‘discipline’imposed by mechanized systems, human beings cannot realistically be expectedto produce extensive, entirely formal proofs. Elements of higher-level, rigorous-argument reasoning will inevitably creep in.

Whether or not this suggestion is valid, it can be noted that mechanizedsystems have enabled a considerable expansion of the domain of formal proof. Inthe first half of the twentieth century, the notion of formal proof was largely atool of metamathematics: a way of modelling ‘proof ’ so as to permit precisereasoning about it. It was not a viable practical alternative to rigorous-argumentproof in anything other than very limited mathematical domains. In the latetwentieth century, however, the advent of automated theorem provers and proof-checkers permitted the formal proof of significant parts of mathematics andmathematical logic, including quite difficult theorems like Godel’s incomplete-ness theorem (Shankar 1994). At least two notions of proof—non-mechanizedrigorous argument, and mechanized formal proofs—are now practised on a

2343Cultures of proving

Phil. Trans. R. Soc. A (2005)

relatively large scale, and are available to be counterposed by those who wish forwhatever reason to do so.

4. Disciplines and applications

One factor underpinning the cultures of proving listed in figure 3 is the structureof academic disciplines, in particular the separation between mathematics andphilosophy and the resultant sometimes uneasy interstitial situation of logic.While most mathematicians are in practice committed to rigorous-argumentproof, formal proof has become the canonical notion of ‘proof ’ in modernphilosophy, and Barwise is almost certainly correct in suggesting that mostlogicians also adhere to this view.

Computer science appears heterogeneous in this respect. It is the disciplinaryhome of mechanized formal proof, and many logicians have found posts incomputer science easier to obtain than in their parent discipline, with at leastsome of them bringing with them a preference for formal proof. However,computer science also generated the celebrated attack on mechanized formalproof by DeMillo et al (1979). Furthermore, commitment to artificial intelligencecan create a preference for rigorous-argument proof. If one’s goal is automatedreplication of human reasoning, then the mechanization of rigorous argument canbe seen as a more appropriate objective than the mechanization of formal proof,for all the latter’s greater technological tractability.

However, overall factors of this kind should not be overemphasized: sometimesvery specific factors condition the appropriateness of particular notions of proof.For example, because at the time of the VIPER lawsuit the formal, mechanizedproof of the correctness of its design was incomplete, defence of the claim forproof for VIPER was in effect forced implicitly to defend rigorous-argumentproof. One such defence was mounted by a leading figure in the UK softwareindustry, Martyn Thomas, founder of the software house Praxis. ‘We mustbeware’, he wrote, ‘of having the term “proof ” restricted to one, extremelyformal, approach to verification. If proof can only mean axiomatic verificationwith theorem provers, most of mathematics is unproven and unprovable. The“social” processes of proof are good enough for engineers in other disciplines,good enough for mathematicians, and good enough for me.... If we reserve theword “proof” for the activities of the followers of Hilbert [David Hilbert, leader of“formalism” within mathematics], we waste a useful word, and we are in dangerof overselling the results of their activities’ (MacKenzie 2001). Although Thomaswas in a sense defending proof as conducted by mathematicians, he was nothimself a mathematician: he trained as a biochemist before entering thecomputer industry.

Nor are the cultures of proving depicted in figure 3 homogeneous. Oursymposium revealed differences between mathematicians in views of ‘proof ’, andautomated theorem proving and formal verification likewise have their divides.One such divide, alluded to above, concerns the use in proof of decisionprocedures that have not themselves been subject to formal verification. This isone manifestation of a more general set of divides concerning the attitude to betaken to the fact that automated theorem provers are themselves quitecomplicated computer programs which may contain faults in design or

D. MacKenzie2344

Phil. Trans. R. Soc. A (2005)

implementation. In interviews for this research, designers of automated theoremprovers often reported experience of ‘bugs’ in their systems that would haveallowed ‘theorems’ that they knew to be false nevertheless to be proven. Suchbugs were not large in number, they were corrected whenever they werediscovered, and I know of only one case in which a theorem-proving bug caused afalse result whose falsity was not detected immediately (it led to a claim of anautomated proof of the Robbins conjecture—that Robbins algebras areBoolean—that later had to be withdrawn; see MacKenzie 2001). But no designerseemed able to give an unequivocal guarantee that no such bugs remained.

Reactions to the issue varied amongst those interviewed. One interviewee(Alan Robinson, developer of the fundamental theorem-proving procedure of‘resolution’) suggested that the possibility of unsoundness in the design oftheorem provers indicates that the overall enterprise of formal verification isflawed, because of what he has come to believe to be the impoverishment of theformal notion of proof:

You’ve got to prove the theorem-proving program correct. You’re in aregression, aren’t you?.That’s what people don’t seem to realize when theyget into verification. They have a hairy great thing they’re in doubt about,so they produce another hairy great thing which is the proof that this one’sOK. Now what about this one which you’ve just [used to perform the proof]?.I say that serves them jolly well right.

That is not the response of program and hardware verification ‘insiders’. Whilepaying considerable attention to soundness, they feel that theorem-prover bugsare not important practical worries compared to ensuring that the specificationof a system expresses what, intuitively, is intended:

If you.ask where the risks are, and what are the magnitudes of the risks,the soundness of the logic is a tiny bit, a really tiny bit, and the correctnessof the proof tool implementing the logic is slightly larger [but]actually.quite a small risk.

As that last quotation reminds us, automated theorem provers are developednot just as an important intellectual exercise in its own right, but to supportverification in contexts in which hardware or software design faults can be fatal,can compromise national security, or can be very expensive in theirconsequences. For many years, the most important single source of funding forthe development of automated theorem provers was the national securitycommunity, in particular the US Defense Advanced Research Projects Agencyand National Security Agency. The demands of national security—in particularthe desire for a theorem-proving system that prevents a hostile agent in adevelopment team from constructing a spurious ‘proof ’ of security—haveinfluenced how at least some theorem-provers have been developed (seeMacKenzie 2001).

A more recent influence is more generic. A major barrier to the practical use oftheorem provers in computer-system verification is that their use requires largeamounts of highly skilled human input. The theorem provers used in verificationare automated, not automatic. Human beings guide them, for example bybreaking up a desired proof into a structure of lemmas that are within theprover’s capacity, a task that requires a grasp both of what needs to be proved

2345Cultures of proving

Phil. Trans. R. Soc. A (2005)

and of how the prover goes about constructing proofs. It is often a slow,painstaking and expensive process.

In contrast, model checkers and other systems that implement decisionprocedures are automatic. Human skill may still be needed in order to representthe design of a system and its specification in such a way that model checking isfeasible, but once that is done a model checker is effectively a ‘push button’device. The attractiveness in an industrial context of such an approach isobvious. Accordingly, in recent years, the goal of automatic rather than human-guided operation has transformed research efforts in the field of automatedverification. Model checking and decision procedures have become anincreasingly dominant focus of attention (Alan Bundy, personal communi-cation). The practical demands of successful industrial application have thusreshaped research in automated verification.

5. Conclusion

In a world increasingly dependent upon computer systems, the diverse domainssummarized in figure 1 are of obvious practical importance. This article hasargued that they also provide fruitful material for the development of a sociologyof proof. Alongside the cultures of proving that constitute human-performedmathematics have grown up the other approaches listed in figure 3, as well asapproaches based upon decision procedures (which do not fit within figure 3’sgrid because they involve mechanized procedures that neither produce formalproofs nor resemble rigorous arguments).

The health of the domains of research listed in figure 1 is thus not simply ofpractical importance. These domains also constitute a set of experiments in themeaning of deductive ‘proof ’. They already have a rich history that demands farmore attention from historians of science than it has received. Their futurepractical fortunes, and the controversies they will surely spark, should also offerfruitful terrain for sociologists of science in the years to come.

The writing of this paper was supported by DIRC, the Interdisciplinary Research Collaboration onthe Dependability of Computer-Based Systems (UK Engineering and Physical Sciences ResearchCouncil grant GR/N13999). The original interviewing was supported by the UK Economic andSocial Research Council under the Program in Information and Communication Technologies(A35250006) and research grants R000234031 and R00029008; also by the Engineering andPhysical Sciences Research Council under grants GR/J58619, GR/H74452 and GR/L37953. I owethe example of the mutilated chessboard to Alan Robinson.

References

Barwise, J. 1989 Mathematical proofs of computer system correctness. Notices Am. Math. Soc. 36,844–851.

Black, M. 1946 Critical thinking: an introduction to logic and scientific method. New York:Prentice-Hall.

Brock, B. & Hunt, W. A. 1990 Report on the formal specification and partial verification of theVIPER microprocessor. Austin, Texas: Computational Logic, Inc.

Bundy, A., Jamnik, M. & Fugard, A. 2005 What is a proof? Phil. Trans. R. Soc. A (doi:10.1098/rsta.2005.1651.)

D. MacKenzie2346

Phil. Trans. R. Soc. A (2005)

Cohn, A. 1989 The notion of proof in hardware verification. J. Automated Reasoning 5, 127–139.

(doi:10.1007/BF00243000.)

Ministry of Defence 1991 Interim defence standard 00–55: the procurement of safety critical

software in defence equipment. Glasgow: Ministry of Defence, Directorate of Standardization.

DeMillo, R., Lipton, R. & Perlis, A. 1979 Social processes and proofs of theorems and programs.

Commun. ACM 22, 271–280. (doi:10.1145/359104.359106.)

Dijkstra, E. W. 1988 Real mathematicians don’t prove. Handwritten memo, EWD1012 (Austin,

Texas, January 24).

Dijkstra, E. W. & Scholten, C. S. 1990 Predicate calculus and program semantics. New York:

Springer.

Fetzer, J. 1988 Program verification: the very idea. Commun. ACM 31, 1048–1063. (doi:10.1145/

48529.48530.)

Kleiner, I. 1991 Rigor and proof in mathematics: a historical perspective. Math. Mag. 64, 291–314.

Linger, R. C., Mills, H. D. & Witt, B. I. 1979 Structured programming: theory and practice.

Reading, MA: Addison-Wesley.

Livingston, E. 1999 Cultures of proving. Soc. Stud. Sci. 29, 867–888.

MacKenzie, D. 1991 The fangs of the VIPER. Nature 352, 467–468. (doi:10.1038/352467a0.)

MacKenzie, D. 2001 Mechanizing proof: computing risk, and trust. Cambridge, MA: MIT Press.

Manolios, P. & Moore, J. S. 2001 On the desirability of mechanizing calculational proofs. Inf. Proc.

Lett. 77, 173–179. (doi:10.1016/S0020-0190(00)00200-3.)

Nidditch, P. H. 1957 Introductory formal logic of mathematics. London: University Tutorial Press.

Pelaez, E., Fleck, J. & MacKenzie, D. 1987 Social research on software. Paper presented to

workshop of the Economic and Social Research Council, Programme on Information and

Communication Technologies, Manchester, December.

Penrose, R. 1989 The emperor’s new mind: concerning computers, minds and the laws of physics.

Oxford University Press.

Shankar, N. 1994 Metamathematics, machines and Godel’s proof. Cambridge University Press.

van de Snepscheut, J. L. A. 1985 Trace theory and VLSI design. Berlin: Springer.

Whitehead, A. N. & Russell, B. 1910–13 Principia mathematica. Cambridge University Press.

Discussion

D. B. A. EPSTEIN (Department of Mathematics, University of Warwick, UK ). Is itfeasible (in some technical sense) to formalize a typical mathematical proof? Canone estimate the complexity of the process of formalization? Can one prove forexample that it is NP-hard (to formalize)? This requires formalization of thequestion itself.

D. MACKENZIE. It does appear to be feasible in practice to formalize many typicalmathematical proofs, at least the simpler such proofs. However, the processgenerally has to be guided by human beings: today’s automatic systems areusually quite unable to handle any other than relatively simple cases. Thecomputational complexity of theorem proving is undoubtedly part of thereason. The underlying formal results are well beyond my competence as asociologist, but I believe that amongst complexity-theory results in mathematicaldomains relevant to automated theorem-proving are (a) that the problem ofchecking whether formulae in propositional logic are tautologies is NP-complete,and (b) that the complexity of the decision problem in Presburger arithmetic isworse than exponential. Both results suggest constraints on the usefulness of

2347Cultures of proving

Phil. Trans. R. Soc. A (2005)

‘brute force’ searches for proofs. However, since complexity-theory constraints donot stop human beings proving ‘hard’ theorems, it remains possible that progressin automated reasoning techniques may lead to systems with far greatercapacities than those of the present.

S. COLTON (Department of Computing, Imperial College London, UK ). Doesmainstream mathematicians’ seemingly genuine difficulty (or inability) tointrospect on the processes they use to generate theorems and proofs add tothe reason why automating rigorous argument is difficult for artificialintelligence?

D. MACKENZIE. Again, I should emphasize that I am not a technical specialist inthis area, but I suspect the answer to Colton’s question is ‘yes’. One way ofdesigning automated theorem provers to circumvent the ‘combinatorialexplosion’ generated by ‘brute force’ searches would be to guide the search forproofs by incorporating the ‘heuristics’ used by human mathematicians.Attempts to do this do not, however, seem to have been consistently successful.

A. V. BOROVIK (School of Mathematics, University of Manchester, UK ). Theculture of (traditional) mathematics is that of openness; proofs of theorem aresupposed to be open to everyone to check. Can we trust proofs produced by acommercial company on behalf of another company and kept out of the publicdomain?

D. MACKENZIE. In brief, no! However, the practical consequences of this may beless than might be imagined. Many of the benefits of subjecting designs to ‘formalverification’ come from the very effort to do so, for example in the way ittypically brings design faults to light. The finished product of formalverification—the ‘proof object’—may thus be less important than the processof constructing it. It might also be naıve to think that many such proof objectswill be subject to detailed scrutiny by people not directly involved, even if theproof objects are publicly available. Many proof objects are far from ‘readable’ inthe way in which a traditional mathematical proof is readable. Nevertheless,proofs of safety-critical systems clearly should be in the public domain, and itmay not be utopian to hope that eventually it will be common for them to bechecked by automated proof-checking systems different from those on which theywere produced.

J. G. HENDERSON (Program Notes Ltd, Pinner, Middlesex, UK ). Is mathematicalproof becoming an ‘act of faith’ within our culture? e.g. whilst I ‘believe’ the four-colour theorem to be true—as a software engineer, I have yet to see the computerprogram’s design and code for it!

D. MACKENZIE. Any complex society, such as ours, involves an extensive divisionof labour in which much has to be taken on trust. Few of us pause to check thesoundness of the aerodynamic design of an airliner before we board it or verifythe correctness of the implementation of the cryptographic protocols beforesubmitting our credit card number to a website! There is a sense in which wenormally have no practical alternative but to trust that such things have beenthoroughly checked by appropriate experts. Cutting-edge research mathematicsis often going to be comprehensible to only a limited number of specialists: thereare many proofs (for example, the proof of Fermat’s last theorem) that even

D. MacKenzie2348

Phil. Trans. R. Soc. A (2005)

many professional mathematicians outside the relevant specialist area willstruggle to understand. Increased specialization within mathematics almostcertainly means that the number of such cases is far greater now than a centuryor two centuries ago, and in those cases non-specialist mathematicians mayindeed have no practical alternative but to trust that the scrutiny of putativeproofs by their specialist colleagues has been thorough. I do not see it as a‘problem’: to my mind, it is an inevitable consequence of specialization.

G. WHITE (Computer Science, Queen Mary, University of London, UK ). Whatconstitutes ‘formal’? There are calculations which prove theorems, but which canonly be proved to be rigorous by informal argument. For example, the two-dimensional notations for traced monoidal categories (Joyal et al. 1996). And oneis surely only tempted to think that there is a unitary concept of ‘formal’ if onetacitly identifies the formal with the foundational—an extremely problematicassumption.

D. MACKENZIE: The word ‘formal’ is a contested one, and it can certainly be usedin senses other than that in my paper. I believe that it is also the case that whenmathematicians and logicians seek to prove theorems about matters such as thesoundness, completeness and consistency of formal systems, the proofs theyproduce are, in the terminology of the paper, often (indeed usually) ‘rigorousargument’ proofs, not formal proofs.

R. CHAPMAN (SPARK Team, Praxis, Bath, UK ). When the presenter revealedthe source of his definitions of ‘formal proof ’ and ‘rigorous argument’ to be theMoD Interim Def-Stan 00–55, why did people laugh?

D. MACKENZIE. I suspect the reason for laughter was the way in which afundamental ‘philosophical’ issue (the nature of ‘proof ’) appeared in anapparently bureaucratic document such as a defence procurement standard. Itis, of course, part of the fascination of this area that a document such as aprocurement standard has to venture into the quintessentially philosophicalterrain of having to say what ‘proof ’ is.

R. POLLACK (School of Informatics, University of Edinburgh, UK ). DonaldMacKenzie showed a 2!2 matrix of different ‘proof cultures’. I pointed out thatMacKenzie’s cultures had many subcultures. In particular I objected toMacKenzie’s use of the phrase ‘automated proof ’ for the culture of formalproofs using computers, as a major group does not use automated search forproofs as its main approach, but machine checking of proofs developed by humanusers interactively with machine support.

D. MACKENZIE. I plead guilty: Pollack is correct in regard to subcultures. In abrief paper, it is impossible to do them justice. I hope I have done a little better inD. MacKenzie, Mechanizing proof: computing, risk, and trust (Cambridge, MA:MIT Press, 2001). In regard to terminology, I distinguish between ‘automated’proof (for example, ‘automated theorem prover’) and fully ‘automatic’ proof. AsI use the word, ‘automated’ proof includes the large and important categoryidentified by Pollack: proofs which are developed on computerized systems, andchecked by such systems, but in which proof construction is guided, often indetail, by human beings. Perhaps ‘semi-automated’ would thus be a better term,but the field uses the term ‘automated theorem prover’, not ‘semi-automated’,

2349Cultures of proving

Phil. Trans. R. Soc. A (2005)

Additional reference

Joyal, A., Street, R. & Verify, D. 1996 Traced monoidal categories. Math. Proc. Camb. Phil. Soc.119, 447–468.

and I followed the field’s usage. As I noted in response to Epstein’s question, thecapacities of ‘semi-automated’, human-guided systems are currently muchgreater than those of fully automatic systems.

M. ATIYAH (Department of Mathematics and Statistics, University of Edinburgh,UK). In both mathematics and physical science there is a hierarchical structure,involving a fundamental level and a higher level. In physics or chemistry we havequantum mechanics with the Schrodinger equation, but for most of chemistry orsolid state physics one cannot in practice deduce everything from thefoundations. In mathematics we have formal proof, but usual mathematicalproof (‘rigorous reasoning’) cannot be reduced to formal proof. In both physicsand mathematics we have to live with these imperfections.

D. MACKENZIE: Atiyah is right, and his analogy is elegant and apposite. I wouldnote, however, that the situation he describes is changing and will continue tochange. The growing capacities of ‘semi-automated’ proving systems mean thatalthough many areas, especially of current research mathematics, remainintractable, vastly larger numbers of rigorous argument proofs have now beenreduced to formal proofs than was the case 30 years ago. Interestingly, though,the outcome of this effort supports the defence of rigorous-argument proof thatI think underpins Atiyah’s comment. The semi-automated formalization ofrigorous-argument proofs only very seldom uncovers serious mistakes in ‘usualmathematical proof ’, at least in the context of well-trodden mathematicalterrain. Such proof thus seems quite robust, and it need not be conceded thatreliance upon it is an ‘imperfection’, at least in the sense of having deleteriouspractical consequences. The practical benefits of automatic or semi-automatedproof-checking may therefore lie not primarily in mathematics, but in computersystems development, where conventional practices are far from robust in thesense of being capable of reliably producing systems free from serious flaws.

D. MacKenzie2350

Phil. Trans. R. Soc. A (2005)

The challenge of computer mathematics

BY HENK BARENDREGT AND FREEK WIEDIJK

Radboud University Nijmegen, 6500 GL Nijmegen, The Netherlands([email protected])

Progress in the foundations of mathematics has made it possible to formulate allthinkable mathematical concepts, algorithms and proofs in one language and in animpeccable way. This is not in spite of, but partially based on the famous results of Godeland Turing. In this way statements are about mathematical objects and algorithms,proofs show the correctness of statements and computations, and computations aredealing with objects and proofs. Interactive computer systems for a full integration ofdefining, computing and proving are based on this. The human defines concepts,constructs algorithms and provides proofs, while the machine checks that the definitionsare well formed and the proofs and computations are correct. Results formalized so fardemonstrate the feasibility of this ‘computer mathematics’. Also there are very goodapplications. The challenge is to make the systems more mathematician-friendly, bybuilding libraries and tools. The eventual goal is to help humans to learn, develop,communicate, referee and apply mathematics.

Keywords: computer mathematics; formalized proofs; proof checking

1. The nature of mathematical proof

Proofs in mathematics have come to us from the ancient Greeks. The notion ofproof is said to have been invented by Thales (ca 624–547 BC). For example, hedemonstrated that the angles at the bases of any isosceles triangle are equal. Hisstudent Pythagoras (ca 569–475 BC) went on to prove, among other things, thetheorem bearing his name. Pythagoras started a society of followers, halfreligious, half scientific, that continued after he passed away. Theodorus(465–398 BC), a member of the society, taught the philosopher Plato (428–347BC) the irrationality of

ffiffiffi

2p

,ffiffiffi

3p

,ffiffiffi

5p

,ffiffiffi

6p

, . ,ffiffiffiffiffi

15p

;ffiffiffiffiffi

17p

. Plato emphasized to hisstudents the importance of mathematics, with its proofs that show non-obviousfacts with a clarity such that everyone can understand them. In Plato’s dialogue,Meno a slave, was requested by Socrates (469–399 BC) to listen and answer, andtogether, using the maieutic method, they came to the insight that the size of thelong side of an isosceles rectangular triangle is, in modern terminology,

ffiffiffi

2p

timesthe size of the shorter side. Not much later the subject of mathematics wasevolved to a sufficient degree that Plato’s student Aristotle (384–322 BC) couldreflect about this discipline. He described the axiomatic method as follows.Mathematics consists of objects and of valid statements. Objects are defined frompreviously defined objects; in order to be able to get started one has the primitive

Phil. Trans. R. Soc. A (2005) 363, 2351–2375

doi:10.1098/rsta.2005.1650

Published online 12 September 2005

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof’.

2351 q 2005 The Royal Society

objects. Valid statements are proved from other such statements; in order to getstarted one has the axioms. Euclid (ca. 325–265 BC) wrote just a few decadeslater his monumental Elements describing geometry in this axiomatic fashion.Besides that, the Elements contain the first important results in number theory(theory of divisibility, prime numbers, factorization) and even Eudoxos’(408–355 BC) account of treating ratios (that was later the inspiration forDedekind (1831–1916) to give a completely rigorous description of the reals ascuts of rational numbers).

During the course of history of mathematics proofs increased in complexity. Inparticular, in the 19th century some proofs could no longer be followed easily byjust any other capable mathematician: one had to be a specialist. This startedwhat has been called the sociological validation of proofs. In disciplines otherthan mathematics the notion of peer review is quite common. Mathematics forthe Greeks had the ‘democratic virtue’ that anyone (even a slave) could follow aproof. This somewhat changed after the complex proofs appeared in the 19thcentury that could only be checked by specialists. Nevertheless, mathematicskept developing and having enough stamina one could decide to become aspecialist in some area. Moreover, one did believe in the review by peers,although occasionally a mistake remained undiscovered for many years. This wasthe case, e.g. with the erroneous proof of the Four Colour Conjecture by Kempe(1879).

In the 20th century, this development went to an extreme. There is thecomplex proof of Fermat’s Last Theorem by Wiles. At first the proof containedan error, discovered by Wiles himself, and later his new proof was checked by ateam of twelve specialist referees.1 Most mathematicians have not followed indetail the proof of Wiles, but feel confident because of the sociologicalverification. Then there is the proof of the Classification of the Finite SimpleGroups. This proof was announced in 1979 by a group of mathematicians lead byGorenstein. The proof consisted of a collection of connected results written downin various places, totalling 10 000 pages. In the proof one relied also on ‘well-known’ results and it turned out that not all of these were valid. Work towardsimproving the situation has been performed, and in Aschbacher (2004) it isannounced that at least this author believes in the validity of the theorem.Finally there are the proofs of the Four Colour Theorem, Appel & Haken(1977a,b) and Robertson et al. (1996), and of Kepler’s Conjecture, Hales(in press). All these proofs use a long computation performed on a computer.(Actually Aschbacher (2004) believes that at present also the proof of theClassification of the Finite Simple Groups relies on computer performedcomputation.) The situation is summed up in table 1.

A very different development, defended by Zeilberger and others, consists ofadmitting proofs where the result is not 100% certain, but say 99.9999999999%.Examples of these proofs concern the primality of large numbers, see Miller(1976) and Rabin (1980).

In this situation the question arises whether there has been a necessarydevaluation of proofs. One may fear that the quote of Benjamin Peirce1One of these referees told us the following. ‘If an ordinary non-trivial mathematical papercontains an interesting idea and its consequences and obtains ‘measure 1’, then Wiles’ proof can berated as having measure 156.’

H. Barendregt and F. Wiedijk2352

Phil. Trans. R. Soc. A (2005)

(1809–1880) ‘Mathematics is the science which draws necessary conclusions’ maynot any longer hold. Scientific American even ventured an article called ‘TheDeath of Proof ’, see Horgan (1993). The Royal Society Discussion Meeting ‘TheNature of Mathematical Proof’ (18–19 October 2004) had a more open-mindedattitude and genuinely wanted to address the question. We will argue below thatproofs remain alive and kicking and at the heart of mathematics. There is asound methodology to ensure the full correctness of theorems with large proofs,even if they depend on complex computations, like the Four Colour Theorem, oron a sociological verification, like the Classification of the Finite Simple Groups.

(a ) Phenomenology

From where does the confidence come that is provided by a proof inmathematics? When a student asks the teacher: ‘Sir, am I allowed to do thisstep?’, the answer we often give is ‘When it is convincing, both for you and me!’.Mathematics is rightly considered as the most exact science. It is not too widelyknown to outsiders that this certainty eventually relies on a mental judgement. Itis indeed the case that proofs and computations are a warranty for the exactnessof mathematics. But both proofs and computations need a judgement that theperformed steps are correct and applicable. This judgement is based on a trainedform of our intuition. For this reason Husserl (1901), and also Godel (1995), andnotably Bernays in Wang (1997, p. 337, 10.2.7), emphasize the phenomenologicalcharacter of the act of doing mathematics.

(b ) Computation versus intuition

In Buddhist psychology one distinguishes discursive versus intuitive knowl-edge. In order to explain this a contemporary example may be useful. Knowingphysics one can calculate the range of angles a bike rider may use while making aright turn. This is discursive knowledge; it does not enable someone to ride abike. On the other hand, a person who knows how to ride a bike ‘feels’ the correctangles by intuition, but may not be able to compute them. Both forms ofknowledge are useful and probably use different parts of our brain.

For the mental act of doing mathematics one may need some support. In fact,before the Greek tradition of proofs, there was the Egyptian–Chinese–Babylonian tradition of mathematics as the art of computing. Being able touse computational procedures can be seen as discursive knowledge. This aspect isoften called the ‘algebraic’ side of mathematics. On the other hand, proofs often

Table 1. Theorems and their verification

verifiable by theorems

lay person/studentffiffiffi

2p

is irrational. There are infinitely many primescompetent mathematician fundamental theorem of algebraspecialist Fermat’s last theoremgroup of specialists classification of the finite simple groupscomputer four colour theorem, Kepler’s conjecture

2353Challenge of computer mathematics

Phil. Trans. R. Soc. A (2005)

rely on our intuition. One speaks loosely about the intuitive ‘geometric’ side ofmathematics.

A computation like 13 338!3 145 727Z41 957 706 726 needs do be done onpaper or by some kind of computer (unless we are an idiot savant; thiscomputation is related to the famous ‘Pentium bug’ appearing in 1994). Symbolicmanipulations, like multiplying numbers or polynomials, performing symbolicintegrations and arbitrary other algebraic computations may not beaccompanied by intuition. Some mathematicians like to use their intuition,while others prefer algebraic operations. Of course knowing both styles is best. Inthe era of Greek mathematics at first the invention of proofs with its compellingexactness drew attention away from computations. Later in the work ofArchimedes (287–212 BC) both computations and intuition did excel.

The story repeated itself some two millennia later. The way in which Newton(1643–1727) introduced calculus was based on the solid grounds of Euclideangeometry. On the other hand, Leibniz (1646–1716) based his calculus oninfinitesimals that had some dubious ontological status (do they really exist?).But Leibniz’ algebraic approach did lead to many fruitful computations and newmathematics, as witnessed by the treasure of results by Euler (1707–1783).Infinitesimals did lead to contradictions. But Euler was clever enough to avoidthese. It was only after the foundational work of Cauchy (1789–1857) andWeierstrass (1815–1897) that full rigour could be given to the computational wayof calculus. That was in the 19th century and mathematics bloomed as neverbefore, as witnessed by the work of Gauss (1777–1855), Jacobi (1804–1851),Riemann (1826–1866) and many others.

During the last third of the 20th century the ‘schism’ between computing andproving reoccurred. Systems of computer algebra, being good at symboliccomputations, were at first introduced for applications of mathematics inphysics: a pioneering system is Schoonschip, see Veltman (1967), which helpedwin a Nobel prize in physics. Soon they became useful tools for puremathematics. Their drawback is that the systems contain bugs and cannotstate logically necessary side-conditions for the validity of the computations. Onthe other hand, systems for proof-checking on a computer have been introduced,the pioneer being Automath of de Bruijn, see Nederpelt et al. (1994). Thesesystems are able to express logic and hence necessary side conditions, but at firstthey were not good at making computations. The situation is changing now, aswill be seen below.

(c ) Computer science proofs

Programs are elements of a formal (i.e. precisely defined) language andthereby they become mathematical objects. It was pointed out by Turing (1949)that one needs a proof to show that a program satisfies some desired properties.This method was refined and perfected by Floyd (1967) and Hoare (1969). Not allsoftware has been specified, leave alone proven correct, as it is often hard to knowwhat one exactly wants from it. But for parts of programs and for some completeprograms that are small but vital (like protocols) proofs of correctness have beengiven. The methodology of (partially) specifying software and proving that therequired property holds for the program is called ‘Formal Methods’.

H. Barendregt and F. Wiedijk2354

Phil. Trans. R. Soc. A (2005)

Proofs for the correctness of software are often long and boring, relying onnested case distinctions, contrasting proofs in mathematics that are usually moredeep. Therefore the formal methods ideal seemed to fail: who would want toverify the correctness proofs, if they were longer than the program itself andutterly uninspiring. Below we will see that also this situation has been changed.

2. Foundations of mathematics

A foundation for mathematics asks for a formal language in which one canexpress mathematical statements and a system of derivation rules using whichone can prove some of these statements. In order to classify the many objectsthat mathematicians have considered an ‘ontology’, describing ways in whichcollections of interest can be defined, comes in handy. This will be provided byset theory or type theory. Finally, one also needs to provide a model ofcomputation in which algorithms performed by humans can be represented inone way or another. In other words, one needs logic, ontology and computability.

(a ) Logic

Not only did Aristotle describe the axiomatic method, he also started thequest for logic. This is the endeavour to chart the logical steps needed inmathematical reasoning. He started a calculus for deductions. The system wasprimitive: not all connectives (and, or, implies, not, for all, exists) were treated,only monadic predicates, like P(n) being ‘n is a prime number’, and not binaryones, like R(n, m) being ‘n!m’, were considered. Nevertheless, the attempt tofind rules sufficient for reasoning was quite daring.

The quest for logic, as needed for mathematical reasoning, was finished 2300years later in Frege (1879). Indeed, Godel (1930) showed that Frege’s system wascomplete (mathematically this was done already by Skolem 1922; but the resultitself was not mentioned there). This means that from a set of hypotheses G astatement A can be derived iff in all structures A in which the hypotheses G hold,also the conclusion A holds.

A particular nice version of logic was given by Gentzen in 1934 (see hiscollected papers, Gentzen 1969). This system is presented in table 2. Someexplanations are in order. The signs /, &, n, u, c, d stand for ‘implies’, ‘and’,‘or’, ‘true’, ‘for all’ and ‘exists’, respectively. lA stands for ‘not A’ and is definedas A/t, where t stands for a (‘the’) false statement (like 0Z1). G stands for aset of statements and GwA stands for ‘from the set G the statement A isderivable’. A rule like

GwA&B

GwA;

has to be read as follows: ‘If A & B is derivable from G, then so is A.’

(i) First-, second- and higher-order logic

The logic presented is first-order logic. It speaks about the element of astructure A (a set with some given operations and relations on it) and canquantify over the elements of this structure. In second-order logic one can

2355Challenge of computer mathematics

Phil. Trans. R. Soc. A (2005)

quantify over subsets of the structure A, i.e. over PðAÞ. Then there is higher-order logic that can quantify over each PnðAÞ. In first-order logic, one candistinguish the difference between continuity and uniform continuity of a givenfunction (say on R).

cx2RceO0ddO0cy2R:jxKyj!d0 jf ðxÞKf ðyÞj!e;

versus

ceO0ddO0cx; y2R:jxKyj!d0 jf ðxÞKf ðyÞj!e:

Here ceO0. has to be translated toce:½eO00.�.In second-order logic one may express that an element x of a group G has

torsion (a power of x is the unit element e) without having the notion of naturalnumber:

cX2PðGÞ:x2X & ½cy2X :ðx,yÞ2X�0e2X :

This states that e belongs to the intersection of all subsets of G that containx and that are closed under left-multiplication by x.

Table 2. Predicate logic natural deduction style

elimination rules introduction rules

GwAGwA/B

GwB;

G;AwB

GwA/B:

GwA&B

GwA

GwA&B

GwB;

GwA GwB

GwA&B:

GwAnB G;AwC G;BwC

GwC;

GwA

GwAnB

GwB

GwAnB:

Gwcx:A

GwA½xdt�t is free in A;

GwA

Gwcx:Ax;G:

Gwdx:A G;AwC

GwCx;C ;

GwA½xdt�

Gwdx:A:

Gwt

GwA; Gwu:

start rule double-negation rule

GwA A2G; GwllA

GwA:

H. Barendregt and F. Wiedijk2356

Phil. Trans. R. Soc. A (2005)

In higher-order logic, one may state that there exists a non-trivial topology onR that makes a given function f continuous

dO2§2ðRÞ:O is a non � trivial topology & cO2O:fK1ðOÞ2O:

Here O is a non-trivial topology stands for

OsPðRÞ&R2O&

cX2PðOÞ: :sX/gX2O½ �&

cX ;Y 2O:XhY 2O:

(ii) Intuitionistic logic

Not long after the first complete formalization of (first-order) logic was givenby Frege, Brouwer criticized this system of ‘classical logic’. It may promise anelement when a statement like

dk:PðkÞ;

has been proved, but nevertheless it may not be the case that a witness is found,i.e. one may not know how to prove any of

Pð0Þ;Pð1Þ;Pð2Þ;Pð3Þ;.:

For example, this is the case for the statement

PðxÞdðx Z 0 &RHÞn ðx Z 1 &lRHÞ;

where RH stands for the Riemann Hypothesis that can formulated in (Peano)Arithmetic. The only possible witnesses are 0 and 1. By classical logic RHnlRHholds. In the first case one can take xZ0 and in the second case xZ1. Thereforeone can prove dx.P(x). One, however, cannot provide a witness, as P(0) can beproved only if the RH is proved and P(1) can be proved only if the RH is refuted.At present neither is the case. One may object that ‘tomorrow’ the RH may besettled. But then one can take another open problem instead of the RH, or anindependent statement, for example a Godel sentence G stating that

‘G’ is not provable;

or the Continuum Hypothesis 2a0Za1 (if we are in set theory). A similarcriticism can be addressed to provable statements of the form AnB. These canbe provable, while neither A nor B can be proved.

Brouwer analysed the situation and concluded that the law of excludedmiddle, AnlA is the cause of this unsatisfactory situation. He proposed to domathematics without this ‘unreliable’ logical principle. In Heyting (1930) analternative logic was formulated. For this logic one can show that

wAnB5wA orwB;

and similarly

wdx:PðxÞ5wPðtÞ; for some expression t:

2357Challenge of computer mathematics

Phil. Trans. R. Soc. A (2005)

Gentzen provided a convenient axiomatization of both classical andintuitionistic logic. In table 2 the system of classical logic is given; if one leavesout the rule of double negation one obtains the system of intuitionistic logic.

(b ) Ontology

Ontology is the philosophical theory of ‘existence’. Kant remarked thatexistence is not a predicate. He probably meant that in order to state thatsomething exists we already must have it. Nevertheless, we can state that thereexists a triple (x, y, z) of positive integers such that x2Cy2Zz2 (as Pythagorasknew), but not such that x3Cy3Zz3 (as Euler knew). Ontology in thefoundations of mathematics focuses on collections of objects O, so that onemay quantify over it (i.e. stating cx2O.P(x), or dx2O.P(x)). Traditionalmathematics only needed a few of these collections: number systems andgeometric figures. From the 19th century on a wealth of new spaces was neededand ample time was devoted to constructing these. Cantor (1845–1918)introduced set theory that has the virtue of bringing together all possible spaceswithin one framework. Actually this theory is rather strong and not allpostulated principles are needed for the development of mathematics. Aninteresting alternative is type theory in which the notion of function is a firstclass object.

(i) Set theory

Postulated are the following axioms of ‘set existence’:

These axioms have as intuitive interpretation the following. N is a set; if a, bare sets, then {a, b} is a set;.; if P is a property over sets, then {x2ajP(x)} is aset; if for every set x there is given a unique F(x) in some way or another, then{F(x)jx2a} is a set. We will not spell out the way the above axioms have to beformulated and how P and F are given, but refer the reader to a textbook onaxiomatic set theory, see e.g. Kunen (1983). Also there are the axioms of ‘setproperties’:

a Z b5cx:½x2a5x2b� ðextensionalityÞ

ca:½½dx:x2a�0dx:½x2a &ldy:y2x &y2a�� ðfoundationÞ:

The axiom of extensionality states that a set is completely determined by itselements. The axiom of foundation is equivalent with the statement that everypredicate P on sets is well-founded: if there is a witness x such that P(x) holds, thenthere is a minimal witness x. This means that P(x) but for no y2x one has P(y).Another way to state foundation:caldf 2ðN/aÞcn2N:f ðnC1Þ2f ðnÞ.

H. Barendregt and F. Wiedijk2358

Phil. Trans. R. Soc. A (2005)

(ii) Type theory

Type Theory, coming in several variants, forms an alternative to set theory.Postulated are inductively defined data types with their recursively definedfunctions. Moreover types are closed under function spaces and products. A typemay be thought of as a set and that an element a belongs to type A is denotedby a:A. The difference with set theory is that in type theory an element has aunique type. Inductive types are given in the following examples (boolean,natural numbers, lists of elements of A, binary trees with natural numbers atthe leafs).

These definitions should be read as follows. The only elements of bool aretrue, false. The elements of nat are freely generated from 0 and the unary‘constructor’ S, obtaining 0, S(0), S(S(0)),.. One writes for elements of nat1ZS(0), 2ZS(1),..

A typical element of list_nat is

h1; 0; 2iZ consð1; consð0; consð2;nilÞÞÞ:

A typical tree is

A typical element of A!B is ha, biZpair(a, b), where a:A, b:B. Giventypes A, B, one may form the ‘function-space’ type A-OB. There is theprimitive operation of application: if f: A-OB and a:A, then f(a):B.Conversely there is abstraction: if M:B ‘depends on’ an a:A (like a2CaC1depends on a:nat) one may form the function fd(a1M):(A-OB). Thisfunction is denoted by la:A.M (function abstraction). For example this canbe used to define composition: if f:A-OB, g:B-OC, then gfdla:A.g(f(a)):A-OC. Next to the formation of function space types there isthe dependent cartesian product. If B is a type that depends on an a:A, thenone may form Pa:A.B. One has (here B[ adt] denotes the result ofsubstitution of t in B for a)

f:ðPa:A:BÞ;t:A0fðtÞ:B½adt�:

A typical example is BZAn for n:nat. If f:(Pn:nat.An), then f(2n):A2n. Type theories are particularly convenient to express intuitionistic

2359Challenge of computer mathematics

Phil. Trans. R. Soc. A (2005)

mathematics. Type theories differ as to what dependent cartesian productsand what inductive types are allowed, whether or not they are predicative,2

have ‘powersets’, the axiom of choice. See Martin-Lof (1984), Aczel &Rathjen (2001), Barendregt & Geuvers (2001) and Moerdijk & Palmgren(2002). In Feferman (1998, ch. 14), a type-free system (which can be seen as asystem as a type system with just one type) is presented for predicativemathematics.

(c ) Computability

Mathematical algorithms are much older than mathematical proofs. Theyhave been introduced in Egyptian–Babylonian–Chinese mathematics a long timebefore the notion of proofs. In spite of that, reflection over the notion ofcomputability through algorithms has appeared much later, only about 80 yearsago. The necessity came when Hilbert announced in 1900 his famous list of openproblems. His 10th problem was the following.

Given a Diophantine equation with any number of unknown quantities andwith rational integral numerical coefficients: to devise a process according towhich it can be determined by a finite number of operations whether theequation is solvable in rational integers.3

By a number a steps over a time interval of nearly 50 years, the final one byMatijasevic using the Fibonacci numbers, this problem was shown to beundecidable, see Davis (1973). In order to be able to state such a result oneneeded to reflect over the notion of algorithmic computability.

Steps towards the formalization of the notion of computability were done bySkolem, Hilbert, Godel, Church and Turing. At first Hilbert (1926; based onwork by Grassmann, Dedekind, Peano and Skolem) introduced the primitiverecursive functions over N by the following schemata (figure 1).4

It was shown by Sudan (1927) and Ackermann (1928) that not all computablefunctions were primitive recursive. Then Godel (based on a suggestion ofHerbrand) introduced the notion of totally defined computable functions,5 basedon what is called nowadays Term Rewrite Systems, see Terese (2003). This classof total computable functions can also be obtained by adding to the primitive

2 In predicative sytems a subset of an infinite set X can only be defined if one does not refer to theclass of all subsets of X. For example fn2N jniseveng is allowed, but not

fn2N jcX4N :PðX ;nÞg:3 By ‘rational integers’ Hilbert just meant the set of integers Z . This problem is equivalent to theproblem over N . The solvability of Diophantine equations over Q is still open.4 This definition scheme was generalized by Scott (1970) to inductive types. For example over thebinary trees introduced above one can define a primitive recursive function mirror as follows.

mirrorðleafðnÞÞZ leafðnÞ;mirrorðbranchðt1; t2ÞÞZbranchðmirrorðt2Þ;mirrorðt1ÞÞ:

It mirrors the tree displayed above.5 Previously called (total) recursive functions.

H. Barendregt and F. Wiedijk2360

Phil. Trans. R. Soc. A (2005)

recursive schemata the scheme of minimalization (‘my.’ stands for ‘the least ysuch that.’), Kleene (1936).

Finally, it was realized that it is more natural to formalize computablepartial functions. This was done by Church (1936) using lambda calculus, andTuring (1936) using what are now called Turing machines. The formalizedcomputational models of Turing and Church later gave rise to the so-calledimperative and functional programming styles. The first is more easy to beimplemented, the second more easy to use and to show the correctness of theprograms.

Both the computational models of Church and Turing have a descriptionabout as simple as that of the first-order predicate calculus. More simple is thecomputational model given in Schonfinkel (1924) that is also capturing all partialcomputable functions. It is a very simple example of a Term Rewrite System(figure 2).

The system is based on terms built up from the constants K, S under a binaryoperation (application). Various forms of data (natural numbers, trees, etc.,) canbe represented as K, S expressions. Operations on this represented data can beperformed by other such expressions.

(d ) The compactness of the foundations

The study of the foundations of mathematics has achieved the following. Thetriple activity of defining, computing and reasoning can be described in each caseby a small set of rules. This implies that it is decidable whether a (formalized)putative proof p (from a certain mathematical context) is indeed a proof of agiven statement A (in that context). This is the basis of the technology ofcomputer mathematics. For more on the relation between the foundationalstudies and computer mathematics, see Barendregt (2005).

3. Computer mathematics

In systems for Computer Algebra one can deal with mathematical objects likeffiffiffi

2p

with full precision. The idea is that this number is represented as a symbol, say a,and that with this symbol one computes symbolically. One has a2K2Z0 butaC1 cannot be simplified. This can be done, since the computational rules for

ffiffiffi

2p

are known. In some senseffiffiffi

2p

is a ‘computable object’. There are many

Figure 1. The primitive recursive functions.

2361Challenge of computer mathematics

Phil. Trans. R. Soc. A (2005)

other computable objects like expressions dealing with transcendental functions(ex, log x) and integration and differentiation.

In systems for computer mathematics, also called Mathematical Assistants,one can even represent non-computable objects. For example the set S ofparameters for which a Diophantine equation is solvable. Also these can berepresented on a computer. Again the non-computable object is representedby a symbol. This time one cannot simply compute whether a given number,say 7, belongs to S. Nevertheless, one can state that it does and in some casesone may prove this. If one provides a proof of this fact, then that proof can bechecked and one can add 72S to the database of known results and use it insubsequent reasoning. In short, although provability is undecidable, being aproof of a given statement is decidable and this is the basis of systems forcomputer mathematics. It has been the basis for informal mathematics aswell.

One may wonder whether proofs verified by a computer are at all reliable.Indeed, many computer programs are faulty. It was emphasized by de Bruijnthat in case of verification of formal proofs, there is an essential gain in reliability.Indeed a verifying program only needs to see whether in the putative proof thesmall number of logical rules are always observed. Although the proof may havethe size of several Megabytes, the verifying program can be small. This programthen can be inspected in the usual way by a mathematician or logician. Ifsomeone does not believe the statement that a proof has been verified, one can doindependent checking by a trusted proof-checking program. In order to do thisone does need formal proofs of the statements. A Mathematical Assistantsatisfying the possibility of independent checking by a small program is said tosatisfy the de Bruijn criterion.

Of particular interest are proofs that essentially contain computations. Thishappens on all levels of complexity. In order to show that a linear transformationA on a finite dimensional vector space has a real eigenvalue one computes

detðAKlI ÞZ pðlÞ;

and determines whether p(l) has a real root. In order to show that a polynomialfunction F vanishes identically on some variety V, one computes a Groebnerbasis to determine whether F is contained in the ideal generated by the equationsdefining V, see Buchberger & Winkler (1998).

Although it is shown that provability in general is undecidable, for interestingparticular cases the provability of statements may be reduced to computing.These form the decidable cases of the decision problem. This will help computermathematics considerably. Tarski (1951) showed that the theory of real closedfields (and hence elementary geometry) is decidable. An essential improvementwas given by Collins (1975). In Buchberger (1965) a method to decidemembership of finitely generated ideals in certain polynomial rings wasdeveloped. For polynomials over R this can be done also by the Tarski–Collins

Figure 2. CL combinatory logic.

H. Barendregt and F. Wiedijk2362

Phil. Trans. R. Soc. A (2005)

method, but much less efficiently so. Moreover, ‘Buchberger’s algorithm’ wasoptimized by e.g. Bachmair & Ganzinger (1994).

In order to show that the Four Colour Theorem holds one checks 633configurations are ‘reducible’, involving millions of cases, see Robertson et al.(1996). How can such computations be verified? All these cases can bestylistically rendered as f (a)Zb that needs to be verified. In order to do thisone first needs to represent f in the formal system. One way to do this is tointroduce a predicate Pf (x, y) such that for all a, b (say natural numbers) one has

f ðaÞZ b5wPf ð �a;

�bÞ:

Here ‘w’ stands for provability. If e.g. aZ2, then�aZSðSð0ÞÞ, a representation of

the object 2 in the formal system.6 In these languages algorithms are representedas so called ‘logical programs’, as happened also in Godel (1931). In other formaltheories, notably those based on type theory, the language itself containsexpressions for functions and the representing predicate has a particularlynatural form

Pf ðx; yÞdðFðxÞZ yÞ:

This is the representation of the algorithm in the style of functionalprogramming. Of course this all is not enough. One also needs to prove thatthe computation is relevant. For example in the case of linear transformationsone needs a formal proof of

Pf ð �A; 0

�Þ4A has an eigenvalue:

But once this proof is given and verified one only needs to check instances ofPf ð �

a;�bÞ for establishing f (a)Zb.

There are two ways of doing this. In the first one the computation trace isproduced and annotated by steps in the logical program Pf (respectively,functional program F). This produces a very long proof (in the order of the lengthof computation of f (a)Zb) that can be verified step by step. Since the resultingproofs become long, they are usually not stored, but only the local steps to beverified (‘Does this follow from that and that?’). One therefore can refer to theseas ephemeral proofs.

On the other hand, there are systems in which proofs are fully stored for lateruse (like extraction of programs from them). These may be called petrifiedproofs. In systems with such proofs one often has adopted the Poincare Principle.This principle states that for a certain class of equations tZs no proofs areneeded, provided that their validity can be checked by an algorithm. This putssome strain on the de Bruijn criterion requiring that the verifying program besimple. But since the basic steps in a universal computational model are simple,this is justifiable.

6 For substantial computations one needs to introduce decimal (or binary) notation for numbersand prove that the operations on them are correctly defined. In the history of mathematics it wasal-Khowarizmı (780–850) who did not introduce algorithms as the name suggests, but proved thatthe well-known basic operations on the decimal numbers are correct.

2363Challenge of computer mathematics

Phil. Trans. R. Soc. A (2005)

4. The nature of the challenge

(a ) State of the art: effort and space

Currently there are not many people who formalize mathematics with thecomputer, but that does not mean that the field of computer mathematics is notyet mature. The full formalization of all of undergraduate universitymathematics is within reach of current technology. Formalizing on that levelwill be labour-intensive, but it will not need any advances in proof assistanttechnology.

To give an indication of how much work is needed for formalization, weestimate that it takes approximately one work-week (five work-days of eightwork-hours) to formalize one page from an undergraduate mathematicstextbook. This measure for some people is surprisingly low, while for others itis surprisingly high. Some people think it is impossible to formalize a non-trivialtheorem in full detail all the way down to the axioms, and this measure showsthat they are wrong. On the other hand, it takes much longer to formalize a proofthan it takes to write a good informal version of it (this takes about half aworkday per page: which is a factor of ten smaller).

One can also compare the formal version of a mathematical proof with thecorresponding informal—‘traditional’—way of presenting that proof. In Wiedijk(2000) it is experimentally found that a file containing a full formalization of amathematical theory is approximately four times as long as the LaTeX source ofthe informal presentation. We call this factor the de Bruijn factor, as de Bruijnclaimed that this ratio is a constant, which does not change when one proceeds informalizing a mathematical theory. Some researchers actually believe that thefactor decreases as the theory grows.

(b ) State of the art: systems

In figure 3 some contemporary systems for computer mathematics that areespecially suited for the formalization of mathematical proof are presented. Onthe left in this diagram there are the four ‘prehistoric’ systems that started thesubject in the early seventies (three of those systems are no longer activelybeing used and have their names in parentheses). These systems differed in theamount of automated help that they gave to their users when doing proofs. Atone extreme there was the Automath system of de Bruijn, that had noautomation whatsoever: all the details of the proofs had to be provided by theuser of the system himself (it is surprising how far one still can go in such asystem). At the other extreme there was the nqthm system—also known as theBoyer–Moore prover—which fully automatically tries to prove the lemmas thatthe user of the system puts to it. In between these two extremes there was theLCF system, which implemented an interesting compromise. The user of thissystem was in control of the proof, but could make use of the automation of so-called tactics which tried to do part of the proof automatically. As will beapparent from this diagram the LCF system was quite influential. The sevensystems in this diagram are those contemporary Proof Assistants in which asignificant body of mathematics has been formalized. To give some impressionof the ‘flavour’ of those systems we have put a superficial characterizationin the right margin. See http://www.cs.ru.nl/wfreek/digimath/ for the

H. Barendregt and F. Wiedijk2364

Phil. Trans. R. Soc. A (2005)

web-addresses with information on these systems. The ontologies on whichthese systems are based are stated in table 3. All of these systems (with theexception of Automath and Mizar) were primarily motivated by computerscience applications. Being able to prove algorithms and systems correct is atthe moment the main driving force for the development of Proof Assistants.This is an extremely exciting area of research. Currently, people who haveexperience with programming claim to ‘know’ that serious programs withoutbugs are impossible. However, we think that eventually the technology ofcomputer mathematics will evolve into a methodology that will change thisperception. Then a bug free program will be as normal as a ‘bug free’ formalizedproof is for us who do formal mathematics.

When one starts applying the technique to mathematics, one may be struckwhen finishing a formalization. Usually one needs to go over a proof when it isfinished, to make sure one really has understood everything and made nomistakes. But with a formalization that phase is not needed anymore. One caneven finish a proof before one has fully understood it! The feeling in that case isnot unlike trying to take another step on a staircase which turns out not to bethere.

Figure 3. Some systems for computer mathematics.

Table 3. Foundational bases for systems of computer mathematics

systems basis

Mizar set theoryCoq, NuPRL intuitionistic type theoryHOL, Isabelle higher order logicPVS higher order logic with predicate subtypesACL2 primitive recursive arithmetic

2365Challenge of computer mathematics

Phil. Trans. R. Soc. A (2005)

On the other hand, when one returns from formalization to ‘normal’programming, it feels as if a safety net has been removed. One can then writedown incorrect things again, without being noticed by the system!

A currently less successful application of Proof Assistants, but one which inthe long run will turn out to be even more important than verification incomputer science, is the application of Proof Assistants to mathematics. TheQED manifesto, see Boyer et al. (1994), gives a lucid description of how thismight develop. We believe that when later generations look back at thedevelopment of mathematics one will recognize four important steps: (i) theEgyptian–Babylonian–Chinese phase, in which correct computations weremade, without proofs; (ii) the ancient Greeks with the development of ‘proof ’;(iii) the end of the nineteenth century when mathematics became ‘rigorous’;(iv) the present, when mathematics (supported by computer) finally becomesfully precise and fully transparent.

To show what current technology is able to do, in table 4 we list sometheorems that have been formalized already. Clearly the technology has not yetreached ‘the research frontier’, but the theorems that can be formalized are notexactly trivial either.

The formalizations that are listed in this table are much like computerprograms. To give an indication of the size of these formalizations: theIsabelle formalization of the Prime Number Theorem by Avigad and othersconsists of 44 files that together take 998 kb in almost thirty thousand linesof ‘code’.

(c ) What is needed?

Today no mathematician uses a Proof Assistant for checking or developingnew work. We believe that in the coming decennia this will change (althoughwe do not know exactly when this will be). We now will list some propertiesthat a system for computer mathematics should have before this will happen.

Table 4. Formalized mathematics

theorem system

Hahn–Banach theorem Mizar, ALFa, Isabellelaw of quadratic reciprocity nqthm, IsabelleGodel’s first incompleteness theorem nqthm, Coqcorrectness of Buchberger’s algorithm ACL2, Agdaa, Coqfundamental theorem of Galois theory Legoa

fundamental theorem of calculus many systemsfundamental theorem of algebra Mizar, HOL, CoqBertrand’s postulate HOL, Coqprime number theorem Isabellefour colour theorem CoqJordan curve theorem HOLtextbook on continuous lattices Mizar

aThe ALF and Lego systems are Proof Assistants from the Automath/Coq/NuPRL tradition thatare no longer in use. Agda is the successor of ALF: it is related to Automath but not to LCF.

H. Barendregt and F. Wiedijk2366

Phil. Trans. R. Soc. A (2005)

(i) Mathematical style

In the current proof assistants the mathematics does not resemble traditionalmathematics very much. This holds both for the statements as well as for theproofs. As an example consider the following statement:

limx/x0

f ðxÞCgðxÞZ limx/x0

f ðxÞC limx/x0

gðxÞ:†

In the HOL system this statement is called LIM_ADD, and there it reads7

This does not match the LaTeX version of the statement. (The technicalreason for this is that as HOL does not support partial functions, the limitoperator is represented as a relation instead of as a function.)

In the Mizar library the statement is called LIMFUNC3:37, and there it reads(where for clarity we replaced the condition of the statement, which states thatthe limits actually exist, by an ellipsis):

Again this does not resemble the informal version of the statement. (Herethe reason is that Mizar does not support binders, and therefore the limitoperator cannot bind the limit variable. Therefore the functions f and g haveto be added instead of the function values f(x) and g(x).)

Clearly unless a system can accept this statement written similar to

mathematicians will not be very much inclined to use it.8

While in most current systems the statements themselves do not lookmuch like their informal counterparts, for the proofs it is even worse. Themain exceptions to this are the Mizar language, and the Isar language forthe Isabelle system. We call these two proof languages mathematicalmodes. As an example, the following is what a proof looks like in the actual

7Here ‘!’ stands for ‘c’ and ‘\’ stands for ‘l’.8 Structurally this last version is what one would like to write, but typographically it still is notideal. Perhaps one day it will be possible to use in a proof assistant a mathematical style like in †above or, at least, the LaTeX source for it.

2367Challenge of computer mathematics

Phil. Trans. R. Soc. A (2005)

Coq system.9

Not even a Coq specialist will be able to understand what is going on in thisproof without studying it closely with the aid of a computer. It will be clear whywe think that having a mathematical mode is essential for a system to beattractive to working mathematicians.

As an example of what a mathematical mode looks like, in figure 4 there is theCoq proof rephrased using the Mizar proof language.10

(ii) Library

The most important part of a proof assistant is its library of pre-provedlemmas. If one looks which systems are useful for doing formal mathematics, thenthose are exactly the systems with a good library. Using an average system witha good library is painful but doable. Using an excellent system without a libraryis not. The bigger the library, the more mathematics one can deal with in areasonable time.

As an example, in Nijmegen we formalized a proof of the FundamentalTheorem of Algebra (see Geuvers et al. 2001) and it took a team of three peopletwo years. At the same time Harrison formalized the same theorem all by himself(as described in Harrison 2001) and it only took him a few days. The main9 In this proof ‘limit_in1 f D l x0’ is the Coq notation for limx/x0 f ðxÞZ l where x ranges overthe set D4R.10 Actually, this is a mixture of Mizar and Coq. The proof language is Mizar, but the statements arewritten in Coq syntax. We do this to be able to follow the Coq script very closely.

H. Barendregt and F. Wiedijk2368

Phil. Trans. R. Soc. A (2005)

difference which explains this huge difference in effort needed, is that he alreadyhad an extensive library while in Nijmegen we had not.11

(iii) Decision procedures

One might imagine that the computer can help mathematicians find proofs.However automated theorem proving is surprisingly weak when it comes tofinding proofs that are interesting to human mathematics. Worse, if one takes anexisting informal textbook proof, and considers the gaps between the steps inthat proof as ‘proof obligations’, then a general purpose theorem prover often will

Figure 4. A proof in mathematical mode.

11 Another difference was that in Nijmegen we formalized an intuitionistic proof, while Harrisonformalized a classical proof. But when analysing the formalizations, it turned out that this was notthe main reason for the difference in work needed.

2369Challenge of computer mathematics

Phil. Trans. R. Soc. A (2005)

not even be able to find proofs for those. For this reason Shankar, whose group isdeveloping PVS, emphasized that rather than the use of general automatedtheorem provers the decision procedures, which specialize on one very specifictask, are important as they will always be able to solve problems in a short time.In fact Shankar claims that the big success of PVS is mainly due to the fact thatit has the best decision procedures of all the systems, and combines those well.

Our view on automating computer mathematics is that a proof is somethinglike an iceberg. When considering all details of the proof, a humanmathematician will not even be consciously aware of the majority of those,just like an iceberg is 90% under water. What is written in papers andcommunicated in lectures is only the 10% of the proof (or even less) which ispresent in the consciousness of the mathematician. We think that theautomation of a system for computer mathematics should provide exactlythose unconscious steps in the proof. (There is a risk of having the computerprovide too many steps so that we will not understand anymore what it is doing,and then we will not be able to guide the proof any longer.)

One should make a distinction between unconscious steps and decidable ones.Some unconscious steps may be guided in undecidable areas by heuristic tactics.Also, some decision procedures have horrendous complexity, so it is not necessarilythe case that they will ‘solve problems in a short time’. However, we like toemphasize that themain function of automation in proof assistants should be takingcare ofunconscious steps, and thatdecisionprocedures are an importantpart of that.

(iv) Support for reasoning with gaps

The manner in which proof assistants are generally being used today is that thewhole formalization is completed all the way to the axioms of the system.This is for agood reason: it turns out that it is very difficult to write down fully correct formalstatements without having the computer help ‘debug’ the statements by requiring toformalize theproofs. If one starts a formalizationbyfirstwritingdowna global sketchofa theory, thenwhenfilling in theactual formal proofs, it often turns out that someofthose statements are not provable after all!

If one just wants to use a Proof Assistant to order one’s thoughts, or tocommunicate something to another mathematician, then fully working out allproofs is just not practical. In that case one would like to just give a sketch of theproof inside the formal system, as described in Wiedijk (2004). Related to this isthe technique called proof planning, see for instance Bundy (1991). Still, thecurrent Proof Assistants do not support this way of working very well.

In Lamport (1995) a proof style is described in which proofs are incrementallydeveloped by refining steps in the proof into more detailed steps. Although thatpaper does not talk about proofs in the computer, and although we are not surethat the specific proof display format that is advocated in that paper is optimal,it is clear that this style of working should be supported by systems for computermathematics, in order to be accepted by the mathematical community.

5. Romantic versus cool mathematics

After the initial proposals of the possibility of computer mathematics manymathematicians protested on emotional grounds. ‘Proofs should be survey-able

H. Barendregt and F. Wiedijk2370

Phil. Trans. R. Soc. A (2005)

in our mind’, was and still is an often heard objection. We call this the romanticattitude towards mathematics. There is another style, cool mathematics, that is,verified by a computer. The situation may be compared to that in biology. Inromantic biology, based on the human eye, one is concerned with flowers andbutterflies. In cool biology, based on the microscope, an extension of our eyes,one is concerned with cells. There is even super-cool molecular biology, based onelectron microscopes. By now we know very well that these latter forms ofbiology are vital and essential and have a romanticism of their own. Similarly, weexpect that cool proofs in mathematics will eventually lead to romantic proofsbased on these. In comparison with biology there is also super-cool mathematics,checked by a computer, with a program this time not checked by the humanmind, but checked by a computer in the cool way. This kind of boot-strap hasbeen used for a compiled (hence faster) version of Coq, see Barras (1996, 1999).

A fully formalized proof in Coq of the Four Colour Theorem has been verified,see Gonthier (2004). Moreover a full proof in HOL of the Jordan curve theoremhas been produced by Tom Hales as part of his work towards a full formalizationof his proof of the Kepler conjecture. Both informal proofs need a long computerverification. These kinds of theorems with a long proof seem exceptional, butthey are not. From the undecidability of provability it follows trivially that therewill be relatively short statements with arbitrarily long proofs.12

We foresee that in the future cool proofs will have romantic consequences andmoreover that computer mathematics will have viable applications.

The authors thank Mark van Atten, Wieb Bosma, Femke van Raamsdonk and Bas Spitters foruseful input.

References

Ackermann, W. 1928 Zum Hilbertschen Aufbau der reellen Zahlen. Mathematische Annalen 99,118–133. (doi:10.1007/BF01459088.)

Aczel, P. & Rathjen M. (2001). Notes on constructive set theory. Technical report, Institut Mittag-Leffler. http://www.ml.kva.se/preprints/meta/AczelMon_Sep_24_09_16_56.rdf.html.

Appel, K. & Haken, W. 1977a Every planar map is four colorable. Part I. Discharging. IllinoisJ. Math. 21, 429–490.

Appel, K. & Haken, W. 1977b Every planar map is four colorable. Part II. Reducibility. IllinoisJ. Math. 21, 491–567.

Aschbacher, M. 2004 The status of the classification of the finite simple groups. Math. Monthly 51,736–740.

Bachmair, L. & Ganzinger, H. 1994 Buchberger’s algorithm: a constraint-based completionprocedure. constraints in computational logics (Munich, 1994). Lecture Notes in ComputerScience, vol. 845, pp. 285–301. Berlin: Springer.

Barendregt, H. In press. Foundations of mathematics from the perspective of computer verification.In Mathematics, computer science, logic—a never ending story. New York: Springer.

Barendregt, H. & Geuvers, H. 2001 Proof-assistants using dependent type systems. In Handbook ofautomated reasoning (ed. Alan Robinson & Andrei Voronkov), pp. 1149–1238. Amsterdam:Elsevier Science Publishers B.V.

Barras, B. 1996 Verification of the interface of a small proof system in Coq. In Proceedings of the1996 Workshop on Types for Proofs and Programs (ed. E. Gimenez & C. Paulin-Mohring),pp. 28–45. Aussois, France: Springer.

12 Indeed if every theorem of length n would have a proof of length 22n

, then theorem-hood would bedecidable by checking all the possible candidate proofs.

2371Challenge of computer mathematics

Phil. Trans. R. Soc. A (2005)

Barras, B. 1999 Auto-validation d’un systeme de preuves avec familles inductives. These de

doctorat, Universite Paris 7.Boyer, R. et al. 1994 The QED manifesto. In Automated deduction—CADE 12, LNAI 814 (ed.

A. Bundy). pp. 238–251. Berlin: Springer, http://www.cs.ru.nl/wfreek/qed/qed.ps.gz.Buchberger, B. 1965 An algorithm for finding a basis for the residue class ring of a zero-dimensional

polynomial ring. Dissertation, University of Innsbruck.Buchberger, B. & Winkler, F. 1998 Grobner bases and applications. Cambridge: Cambridge

University Press.Bundy, A. 1991 A science of reasoning. In Computational logic: essays in honor of Alan Robinson

(ed. J.-L. Lassez & G. Plotkin), pp. 178–198. Cambridge, MA: MIT Press. Also available from

Edinburgh as DAI Research Paper 445.Church, A. 1936 An unsolvable problem of elementary number theory. Am. J. Math. 58, 345–363.Collins, G. E. 1975 Quantifier elimination for real closed fields by cylindrical algebraic

decomposition, Automata theory and formal languages (Second GI Conference, Kaiserslautern,1975) Lecture Notes in Computer Science, vol. 33. Berlin: Springer pp. 134–183.

Davis, M. 1973 Hilbert’s tenth problem is unsolvable. Am. Math. Monthly 80, 233–269.Feferman, S. 1998 In the light of logic. Oxford: Oxford University Press.Floyd, R. W. 1967 Assigning meanings to programs Mathematical Aspects of Computer Science,

Proceedings of Symposia in Applied Mathematics. Providence, RI: American MathematicalSociety pp. 19–32

Frege, G. 1879 Begriffsschrift und andere Aufsatze. Hildesheim: Georg Olms. Zweite Auflage. MitE. Husserls und H. Scholz’ Anmerkungen herausgegeben von Ignacio Angelelli, Nachdruck.

Gentzen, G. 1969 The collected papers of Gerhard Gentzen. In Studies in logic and the foundationsof mathematics (ed. M. E. Szabo). Amsterdam: North-Holland.

Geuvers, H., Wiedijk, F. & Zwanenburg, J. 2001 A constructive proof of the fundamental theorem

of algebra without using the rationals. In Types for proofs and programs (ed. Paul Callaghan,Zhaohui Luo, James McKinna & Robert Pollack) Proceedings of the International WorkshopTYPES 2000, LNCS 2277, pp. 96–111. Berlin: Springer.

Godel, K. 1930 Die Vollstandigkeit der Axiome des logischen Funktionalkalkuls. Monatshefte furMathematik und Physik 37, 349–360. (doi:10.1007/BF01696781.)

Godel, K. 1931 Uber formal unentscheidbare Satze der Principia Mathematica und verwandter

Systeme. Monatshefte fur Mathematik und Physik 38, 173–198. (doi:10.1007/BF01700692.)Translated and commented in Godel (1986) Another English version based on course notes byKleene and Rosser is in Godel (1965).

Godel, K. 1965 On undecidable propositions of formal mathematical systems. In The undecidable:

basic papers on undecidable propositions, unsolvable problems and computable functions (ed.Martin Davis), pp. 41–74. New York: Raven Press. From mimeographed notes on lectures givenby Godel in 1934.

Godel, K. 1995 In Collected works III: unpublished essays and lectures (ed. S. Feferman et al.).Oxford: Oxford University Press.

Godel, K. 1986 Collected works, vol. 1. New York: The Clarendon Press/Oxford University Press.

Publications 1929–1936, Edited and with a preface by Solomon Feferman.Gonthier, G. 2004 The four color theorem in Coq Talk at the TYPES 2004 conference, December

15–18, 2004, Campus Thales, Jouy-en-Josas, France 2004.Hales, T. In press. A proof of the Kepler conjecture. Ann. Math. http://www.math.pitt.edu/

wthales/kepler03/fullkepler.pdf.Harrison, J. 2001 Complex quantifier elimination in HOL. In TPHOLs 2001: Supplemental

Proceedings (ed. Richard J. Boulton & Paul B. Jackson), pp. 159–174. Edinburgh: Division ofInformatics, University of Edinburgh. Published as Informatics Report Series EDI-INF-RR-

0046. http://www.informatics.ed.ac.uk/publications/report/0046.html.Heyting, A. 1930 Die formalen Regeln der intuitionistischen Logik Sitzungsberichte der

Preussischen Akademie von Wissenschaften. Physikalisch-mathematische Klasse 1930pp. 42–56.

H. Barendregt and F. Wiedijk2372

Phil. Trans. R. Soc. A (2005)

Hilbert, D. 1926 Uber das Unendliche. Mathematische Annalen 95, 161–190. (doi:10.1007/BF01206605.)

Hoare, C. A. R. 1969 An axiomatic basis for computer programming. Commun. ACM 12, 576–583.(doi:10.1145/363235.363259.)

Horgan, J. 1993 The death of proof. Sci. Am. 269, 92–103.Husserl, E. 1901 Untersuchungen zur Phanomenologie und Theorie der Erkenntnis. Halle: Max

Niemeyer.Kempe, A. B. 1879 On the geographical problem of the four colors. Am. J. Math. 2, 193–200.Kleene, S. C. 1936 Lambda-definability and recursiveness. Duke Math. J. 2, 340–353. (doi:10.1215/

S0012-7094-36-00227-2.)Kunen, K. 1983 Set theory Studies in logic and the foundations of mathematics, 102. Amsterdam:

North-Holland. An introduction to independence proofs, Reprint of the 1980 original.Lamport, L. 1995 How to write a proof. Am. Math. Monthly 102, 600–608.Martin-Lof, P. 1984 Intuitionistic type theory Studies in proof theory. Lecture notes 1. Naples:

Bibliopolis. Notes by Giovanni Sambin.Miller, G. 1976 Riemann’s hypothesis and tests for primality. J. Comp. Syst. Sci. 13, 300–317.Moerdijk, I. & Palmgren, E. 2002 Type theories, toposes and constructive set theory:

predicative aspects of AST. Ann. Pure Appl. Logic 114, 155–201. (doi:10.1016/S0168-0072(01)00079-3.)

Nederpelt, R. P., Geuvers, J. H. & de Vrijer, R. C. 1994 Twenty-five years of Automath researchSelected papers on Automath. Stud. Logic Found. Math., 133. Amsterdam: North-Hollandpp. 3–54.

Rabin, M. O. 1980 Probabilistic algorithm for testing primality. J. Number Theor. 12, 128–138.(doi:10.1016/0022-314X(80)90084-0.)

Robertson, N., Sanders, D. P., Seymour, P. &Thomas, R. 1996 A new proof of the four-colour theorem.Electron. Res. Announc. Am. Math. Soc. 2, 17–25. (doi:10.1090/S1079-6762-96-00003-0.) http://wwwams.org/era/1996-02-01/S1079-6762-96-00003-0/home.html.

Schonfinkel, M. 1924 Uber die Bausteine der Mathematische Logik. Mathematische Annalen 92,305–316. (doi:10.1007/BF01448013.)

Scott, D. 1970 Constructive validity Symposium on Automatic Demonstration (Versailles, 1968),Lecture Notes in Mathematics, vol. 125. Berlin: Springer pp. 237–275.

Skolem, T. 1922 Uber ganzzahlige Losungen einer Klasse unbestimmter Gleichungen.: NorskMatematisk Forenings skrifter.

Sudan, G. 1927 Sur le nombre transfini uu. Bulletin mathematique de la Societe Roumaine desSciences 30, 11–30.

Tarski, A. 1951 Decision method for elementary algebra and geometry. Berkeley: University ofCalifornia Press.

Terese (ed) 2003. Term Rewrite Systems. Cambridge: Cambridge University Press.Turing, A. M. 1936 On computable numbers, with an application to the Entscheidungsproblem.

Proc. Math. Soc. Ser. 2 42, 230–265.Turing, A. M. 1949 Checking a large routine Report of a Conference on High Speed Automatic

Calculating machines. Paper for the EDSAC Inaugural Conference, 24 June 1949 1949pp. 67–69.

Veltman, M. 1967 SCHOONSCHIP, A CDC 6600 program for symbolic evaluation of algebraicexpressions. Technical Report, CERN.

Wang, H. 1997 A logical journey, from Godel to philosophy. Cambridge, MA: MIT Press (BradfordBooks).

Wiedijk, F. 2000 The de Bruijn factor 2000 http://www.cs.ru.nl/wfreek/notes/factor.ps.gz.Wiedijk, F. 2004 Formal proof sketches. In Types for proofs and programs: Third International

Workshop, TYPES 2003, Torino, Italy, April 30–May 4, 2003, Revised Selected Papers, LNCS3085 (ed. Stefano Berardi, Mario Coppo & Ferruccio Damiani).

2373Challenge of computer mathematics

Phil. Trans. R. Soc. A (2005)

Discussion

N. SHAH (British Society of History of Mathematics, Durham, UK). Automatedreasoning systems either use logic or intuitionistic type theory. Which (in thespeaker’s opinion) will win out?

D. H. BARENDREGT. Intuitionistic type theory also uses logic, just a more explicitone. You probably ask whether the classical logic of some systems will win or notfrom the intuitionistic one used in others. Good question! I think that when thetechnology will gain momentum, then the classical systems will be in themajority, but that on the long term the intuitionistic ones will win. After all theyalso can be in a classical mode by assuming the excluded third.

A. BUNDY (School of Informatics, University of Edinburgh, UK ). If themathematicians in the audience accept your argument then they will start touse automatic theorem provers. How long will it take before this becomescommonplace?

D. H. BARENDREGT. Pessimists think in 50 years; optimists in 10 years.

A. V. BOROVIK (School of Mathematics, University of Manchester, UK ). Whatare advantages of deterministic proof checkers over simpler and quicker non-deterministic procedures? If a non-deterministic procedure confirms the validityof the statement with probability of error less than one in ten, after repeating it100 times we have the probability of error in the non-deterministic judgementbeing less than one in ten to the power of one hundred, which is smaller than theprobability of hardware error in the computer.

D. H. BARENDREGT. For practical purposes I do dare to step in an airplane withthat low chance of falling down. So it is a matter of aesthetics. Nevertheless, themethod of proof-checking applies to the correctness of your non-deterministicprocedure as well.

A. J. MACINTYRE (School of Mathematical Sciences, Queen Mary, University ofLondon, UK ). The Isabelle proof of the Prime Number Theorem is based on theelementary proof. This proof is regarded by mathematicians as less illuminatingthan the complex variable proof. When will Isabelle be able to do the latterproof ? Is the ‘library’ for a Master’s Programme realistic without this?

D. H. BARENDREGT. We need more formalized libraries and in order to get thesemore certified tools. When this will happen depends on how hard we ascommunity work. Complex variables should definitely be in the ‘library’ of acertified Master’s.

P. H. HINES (retired ). ‘Romantic’ proof inspires other mathematicians. Butcool/computer proof does not.

D. H. BARENDREGT. Cool proofs have a romantic flavour of their own. Somecenturies ago biologists got excited about flowers and bees. This is still the case.But now they also get excited about genome sequences.

M. ATIYAH (Department of Mathematics & Statistics, University of Edinburgh,UK ). I can understand how a computer could check a proof, or even develop indetail a proof which was only outlined. It would do what a competent research

H. Barendregt and F. Wiedijk2374

Phil. Trans. R. Soc. A (2005)

student could undertake. But in the real mathematical world the proof, or eventhe formulation of a theorem, is rarely known in advance, even in outline. It ishard to see how a computer can assist in such an ill-defined process.

D. H. BARENDREGT. The human provides the intuition. Then wants to check thatintuition by constructing proofs. At first the proofs are sketchy. It is at this phasethat the mathematical assistants can help.

2375Challenge of computer mathematics

Phil. Trans. R. Soc. A (2005)

What is a proof ?

BY ALAN BUNDY1, MATEJA JAMNIK

2AND ANDREW FUGARD

1

1School of Informatics, University of Edinburgh, Appleton Tower,Crichton Street, Edinburgh EH8 9LE, UK

([email protected])2University of Cambridge Computer Laboratory, J. J. Thomson Avenue,

Cambridge CB3 0FD, UK

To those brought up in a logic-based tradition there seems to be a simple and cleardefinition of proof. But this is largely a twentieth century invention; many earlier proofshad a different nature. We will look particularly at the faulty proof of Euler’s Theoremand Lakatos’ rational reconstruction of the history of this proof. We will ask: how is itpossible for the errors in a faulty proof to remain undetected for several years—evenwhen counter-examples to it are known? How is it possible to have a proof aboutconcepts that are only partially defined? And can we give a logic-based account of suchphenomena? We introduce the concept of schematic proofs and argue that they offer apossible cognitive model for the human construction of proofs in mathematics. Inparticular, we show how they can account for persistent errors in proofs.

Keywords: mathematical proof; automated theorem proving; schematic proof;constructive omega rule

1. Introduction

To those brought up in a logic-based tradition there seems to be a simple andclear definition of proof. Paraphrazing Hilbert (Hilbert 1930):

A proof is a sequence of formulae each of which is either an axiom or followsfrom earlier formulae by a rule of inference.

Let us call a proof in this format Hilbertian.But formal logic and its Hilbertian view of proof is largely a twentieth century

invention. It was invented to help avoid erroneous proofs and to enable proofsabout proofs, for instance Godel’s proof of the incompleteness of arithmetic(Godel 1931). Formal logic has since become the basis for automated theoremproving.

Prior to the invention of formal logic, a proof was any convincing argument.Indeed, it still is. Presenting proofs in Hilbertian style has never taken off withinthe mathematical community. Instead, mathematicians write rigorous proofs, i.e.proofs in whose soundness the mathematical community has confidence, butwhich are not Hilbertian.

Phil. Trans. R. Soc. A (2005) 363, 2377–2391

doi:10.1098/rsta.2005.1651

Published online 14 September 2005

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof ’.

2377 q 2005 The Royal Society

To see that rigorous proofs are not Hilbertian, consider erroneous proofs. Thehistory of mathematics is full of erroneous proofs. The faults in some of theseproofs have remained undetected or uncorrected for many years. However,Hilbertian proofs can be readily checked. We merely need to ask of each formulain the proof: ‘is it an axiom?’ or ‘does it follow from earlier formulas by a rule ofinference?’. Such checking can be readily automated—and, indeed, it often hasbeen. But even in the age before computers, postgraduate students could havecarried out the necessary checks as an apprentice exercise. Mathematical proofsare subjected to a lot of checking, so we can conclude that proofs containingpersistent errors are unlikely to be Hilbertian. If a Hilbertian proof contained anerror, it would surely be quickly detected and corrected.

This may lead us to ask, what are the alternatives to Hilbertian proofpresentation? Can these alternative presentations help us to understand how afault can lie undetected or uncorrected for a long time? Could we formalize thesealternative presentations? Could we automate such mathematical reasoning?

We will present an alternative method of proof presentation that we callschematic proof. Schematic proof is partly inspired by Lakatos’s rationalreconstruction of the history of Euler’s Theorem (Lakatos 1976). It has beenautomated in two projects at Edinburgh, using the constructive u-rule (Baker1993, Jamnik 2001).

We argue that schematic proofs offer a possible cognitive model for the humanconstruction of proofs in mathematics. In particular, we show how they canaccount for persistent errors in proofs. They also help explain a key role forexamples in the construction of proofs and a paradox in the relative obviousnessof different, but related, theorems.

2. Lakatos’s discussion of Euler’s Theorem

Euler’s Theorem1 states that, in any polyhedron, VKECFZ2, where V is thenumber of vertexes, E is the number of edges and F is the number of faces. Thistheorem is illustrated in the case of the cube in figure 1.

In (Lakatos 1976), Imre Lakatos gives a rational reconstruction of the historyof Euler’s Theorem. This history is set in a fictitious classroom of extremelybright students and a teacher. The students adopt different roles in the history ofthe evolution of mathematical methodology. The teacher leads them through this

Figure 1. Euler’s Theorem in the case of the cube. In a cube there are 8 vertexes, 12 edges and 6faces. So VKECFZ8K12C6Z2.

1More properly, this would be called ‘Euler’s Conjecture’, since he proposed, but did not prove it.

A. Bundy and others2378

Phil. Trans. R. Soc. A (2005)

history. To initiate the discussion, the teacher presents a ‘proof’ of Euler’sTheorem due to Cauchy, which we reproduce in figure 2. Later, the teacher andthe students discover various counter-examples to the Theorem. They use thesecounter-examples to analyse the faulty proof and propose a wide variety ofdifferent methods for dealing with the conflict between alleged proofs andcounter-examples. These methods involve refining definitions, correcting proofs,modifying conjectures, etc. Lakatos’s mission was to show how the methodologyof mathematics had evolved: becoming more sophisticated in its handling ofproofs and refutations. Our mission is different: we will analyse the proof methodunderlying Cauchy’s faulty proof and show how errors can arise from the use ofthis proof method.

(a ) Cauchy’s ‘proof’ of Euler’s Theorem

Lakatos’s account of Cauchy’s ‘proof ’ is illustrated in figure 2 for the case ofthe cube. The general ‘proof ’ is given in theorem 2.1.

Theorem 2.1. For any polyhedron, VKECFZ2, where V is the number ofvertexes, E is the number of edges and F is the number of faces.

Cauchy’s ‘proof ’. Given a polyhedron, carry out the following steps.

(i) Remove one face and stretch the other faces onto the plane. Note that Fhas diminished by 1, but that V and E are unchanged. So we are requiredto prove that VKECFZ1.

(ii) Triangulate the remaining faces by drawing diagonals. Note that each newdiagonal increases both E and F by 1, but leavesV unchanged. SoVKECFis unaffected.

(iii) Remove the triangles one by one. There are two cases to consider, illustratedby step (iii) in figure 2. In thefirst case,we removean edge, so that bothE andFdecrease by1. In the second case,we remove two edges andavertex, so thatbothV andFdecrease by 1, butEdecreases by 2. In either case,VKECFZ1is unaffected.

(i) (ii)

(iii)

Figure 2. Cauchy’s ‘proof ’ applied to the cube. In step (i), one face of the cube is removed and theremaining faces are stretched onto the plane. In step (ii), these faces are triangulated to break theminto triangles. In step (iii), these triangles are removed one by one. Two cases of step (iii) can ariseand are illustrated. The figures are adapted from Lakatos (1976).

2379What is a proof?

Phil. Trans. R. Soc. A (2005)

Finally, we are left with one triangle. In a triangle, VZ3, EZ3 and FZ1, soVKECFZ1, as required. &

(b ) A counter-example to Euler’s Theorem

Many people found Cauchy’s ‘proof ’ convincing. However, as Lakatosillustrates, eventually many counter-examples were found. The simplest is thehollow cube, illustrated in figure 3.

Lakatos reports the reaction to this counter-example as a debate aboutwhether the hollow cube is really a polyhedron. He offers two possible definitionsof polyhedron, providing two opposite answers to this question.

Definition 1. A polyhedron is a solid whose surface consists of polygonal faces.Under this definition, the hollow cube is a polyhedron.

Definition 2. A polyhedron is a surface consisting of a system of polygons.Under this definition, the hollow cube is not a polyhedron.

It is interesting to ask how it is possible for Cauchy to have offered a proofabout polyhedra, when the question of their definition was still open.2 This couldnot happen in Hilbertian proofs: definitions are axioms and must come first.What kind of proof might allow us to keep the definitions open?

3. Schematic proofs

Note that the proof of theorem 2.1 is a procedure: given a polyhedron, a series ofoperations is specified, whose application will reduce the polyhedron to thetriangle. The value of VKECF is tracked during these operations. The actualnumber of operations to be applied will vary depending on the input polyhedron.This is very unlike a Hilbertian proof, which is not procedural and in which thesame number of proof steps is used for all examples.

Independently of (Lakatos 1976), the Mathematical Reasoning Group atEdinburgh became interested in the constructive u-rule. We subsequentlyrealized that this rule generates just the kind of proof used above to provetheorem 2.1. We call this kind of proof schematic. Schematic proofs areprocedures that, given an example, generate a proof, which is specific to thatexample. The number of steps in each proof depends on the example.

Figure 3. The hollow cube: a counter-example to Cauchy’s ‘proof ’. The hollow cube is a cube with acubical hole in the middle. The values of V, E and F are all doubled. So VKECFZ16K24C12Z4.

2Nor is it yet closed, since terms such as surface, system, etc. have still to be defined.

A. Bundy and others2380

Phil. Trans. R. Soc. A (2005)

(a ) The constructive u-rule

The u-rule for the natural numbers 0, 1, 2,. is

fð0Þ;fð1Þ;fð2Þ; .cx$fðxÞ ;

i.e. we can infer that f(x) for all natural numbers x provided we can prove f(n)for nZ0, 1, 2,.. The u-rule is clearly not a very practical rule of inference, sinceit requires the proof of an infinite number of premises to prove its conclusion. AHilbertian proof using it would consist of an infinite sequence of formulas. Its useis usually confined to theoretical discussions. It was first described in a publishedwork by Hilbert (1931).

The constructive u-rule is a refinement of the u-rule that can be used inpractical proofs. It has the additional requirement that the f(n) premises beproved in a uniform way, i.e. that there exists a recursive program, prooff, whichtakes a natural number n as input and returns a proof of f(n) as output. We willwrite this as prooff(n): f(n). The recursive program prooff formalizes our notionof schematic proof. Applied to the domain of polyhedra, rather than naturalnumbers, it could be used to formalize Cauchy’s ‘proof ’ of Euler’s Theorem givenin theorem 2.1 in §2a.

(b ) Implementation of the constructive u-rule

To demonstrate its practicality as a rule of inference, we have implementedthe constructive u-rule within two automated theorem-proving systems. Inoutline, these automated theorem-provers use the following procedure.

(i) Start with some proofs for specific examples, e.g. prooff(3): f(3),prooff(4): f(4).

(ii) Generalize these proofs of examples to obtain a recursive program: prooff.(iii) Verify, by induction, that this program constructs a proof for each n.

prooffð0Þ : fð0Þ;

prooffðnÞ : fðnÞwprooffðnC1Þ : fðnC1Þ:

At first sight it may appear that step (iii) replaces a Hilbertian, object-levelinduction with an isomorphic meta-level induction. Siani Baker’s work (Baker1993), however, shows that the meta-level induction is not isomorphic to theobject-level one; often it is much simpler. Her work revealed many examples inwhich the object-level proof required generalization or intermediate lemmas, butthe meta-level proof did not, i.e. the meta-level proof was inherently simpler.

Note that Cauchy’s ‘proof’ omits the final verification step (iii) in the aboveprocedure. He is trusting that his program for reducing polyhedra to triangleswill work for all polyhedra. As we have seen, it does not. This helps explain theerror in his proof.

Both of our implementations of the constructive u-rule were for the naturalnumbers: Siani Baker’s for simple arithmetic theorems (Baker 1993) and thesecond author’s for a form of diagrammatic reasoning (Jamnik 2001), asdescribed in Roger Nelsen’s book ‘Proofs without words’ (Nelsen 1993). Both our

2381What is a proof?

Phil. Trans. R. Soc. A (2005)

implementations included the final verification step, so were guaranteed toproduce only correct schematic proofs.

An example schematic proof from Baker’s work is given in figure 4. Ratherthan describe the program in a programming language, we try to capture theinfinite family of proofs that it outputs, using ellipses to indicate those steps andexpressions that occur a variable number of times.

An example proof from the second author’s work is shown in figure 5. Herprogram, Diamond, was shown proofs for two numbers and then generalizedthese into a program for generating the proof for any number.

4. The relative difficulty of proofs

Proof by induction is the Hilbertian alternative to schematic proofs of theoremsabout recursive data-types, such as the natural numbers. We have alsoinvestigated the automation of inductive proofs (Bundy 2001). However,evidence arising from these investigations suggests that humans do not useinductive proof when assessing the truth or falsity of conjectures. This is perhapsnot surprising, since the formalization of mathematical induction is a relativelymodern development.3 Schematic proof is an alternative candidate model of themechanism humans use to prove conjectures over infinite domains.

To illustrate the evidence against inductive proof as a cognitive model,consider the rotate-length theorem on lists:

cl2listðtÞ$rotðlenðl Þ; l ÞZ l; ð4:1Þwhere l is a list of elements of type t, len is a unary function that takes a list l andreturns its length and rot is a binary function that takes a number n and list l androtates the first n elements of l from the front to the back. The recursivedefinitions of len and rot are given in figure 6. Also defined, is a binary function! O, which takes two lists and appends them together. !O is an auxiliaryfunction in the definition of rot. The most straightforward induction rule for

Figure 4. Schematic proof of the associativity of C. s(n) is the successor function for naturalnumbers, intuitively meaning nC1. sm(0) means s applied m times to 0. Addition is definedrecursively using two equations: the base case 0CyZy and the step case s(x)CyZs(xCy). Cs isrewriting, left to right, using this step case; Cb is rewriting using the base case. Note that thenumber of applications of Cs depends on m.

3 It is usually attributed to Richard Dedekind in 1887, although informal uses of induction dateback as far as Euclid.

A. Bundy and others2382

Phil. Trans. R. Soc. A (2005)

lists isFð½ �Þ ch2t$c t2listðtÞ$FðtÞ/Fð½hjt�Þ

cl2listðtÞ$FðlÞ ;

where [ ] is the empty list, [hjt] places an h at the front of a list t, t is an arbitrarytype and list(t) is the type of lists of elements of type t.

Having absorbed the definitions, most people will readily agree that the rotate-length theorem, stated in equation (4.1) is true. However, its inductive proof issurprisingly difficult. It cannot be proved directly from the recursive definitionsin figure 6. The proof requires the use of auxiliary lemmas, the generalization ofthe theorem and/or the use of a more elaborate form of induction. For instance,one way to prove the rotate-length theorem is first to generalize it to

ck2listðtÞ$c l2listðtÞ$rotðlenðl Þ; l! OkÞZ k ! O l: ð4:2ÞAlthough people will also readily agree that equation (4.2) is also true, they findthis assessment a little more difficult than that of equation (4.1).

So, if people are using induction to assess conjectures such as equations (4.1) and(4.2), even if unconsciously, then we are faced with a paradox: what appears to be afairly easy assessment of equation (4.1), entails what appears to be the harderassessment of equation (4.2). Moreover, the inductive proof is quite difficult,requiring, for instance, the generalizationof the initial conjectureand the speculationand proof of a couple of intermediate lemmas (or some alternative but similarlycomplex processes) (Bundy 2001, §§ 6.3.3 and 6.2.2). This phenomenon is not rare.On the contrary,wehave found lots of similar examples,where an intuitively obviousconjecture has only a complex inductive proof, requiring generalization, lemmas,non-standard inductions or a mixture of these. The intermediate generalizations,lemmas, etc. are often harder to assess than the original conjecture.

The schematic proof of the rotate-length theorem is given in figure 7. This doesnot require any generalizations or intermediate lemmas and is fairlystraightforward. The schematic proof can be viewed as evaluating the theoremon a generic list using the recursive definitions. In informal experiments, when wehave asked subjects what mechanism they used to assess the truth/falsity of thistheorem, they report a process that resembles this schematic proof,4 making it acandidate for a cognitive model.

Figure 5. Aproofwithoutwords.Thediagramgivesaproof of the theoremn2Z1C3C/C(2nK1).Thediagram can be viewed as describing both the left and right hand sides of the equation. Thewhole squarerepresents n2. Each of the L-shapes represents one of the odd numbers summed on the right-hand side.

4 For instance, Aaron Sloman, personal communication.

2383What is a proof?

Phil. Trans. R. Soc. A (2005)

5. Schematic proofs as a cognitive model

We hypothesize that schematic proofs provide a cognitive model of somemathematical proofs. We are soon to start a project to test this hypothesis. Itappears to provide a good fit to procedural-style proofs, such as Cauchy’s ‘proof ’of Euler’s Theorem and some informal justifications of inductive conjectures. Itgives an account of how certain kinds of errors might occur: if the finalverification step is omitted, then it is a matter of luck whether a schematic proofwill produce a sound proof for every example. If the number of possible examplesis infinite or very large, it may take some considerable time before someonecomes across an example for which the schematic proof fails. It may be difficult totell why the schematic proof fails on this example and how to repair it.

The constructive u-rule provides a logic-based, formal account of schematicproof, as an alternative to the standard Hilbertian account. This gives us a soundmathematical basis to investigate rigorous,5 as opposed to Hilbertian proof.It fills a gap in MacKenzie’s classification of proofs (this volume) by providing anexample of a rigorous but mechanizable style of proof.

To confirm our hypothesis, we will have to carry out a series of psychologicalexperiments to compare human mathematical performance with our implemen-tations and formal accounts of schematic proof. Here is an outline of the kind ofexperiment we might perform.

(i) Present theorems and near-miss non-theorems to human mathematicians.(ii) Ask for both their judgement of the truth/falsity of each example,

together with a justification of that decision.(iii) Investigate how subjects explore and absorb the meaning of recursive

definitions. Do they try examples instances? Do they reason inductively?We may provide a computational tool that will support subjects’exploration and keep a record of these explorations as experimental data.

(iv) Try to model these justifications and explorations computationally.

Figure 6. The recursive definitions of some functions. Each definition consists of one or more baseand step cases. The base cases define the function for one or more initial values. The step casesdefine the function for constructed values in terms of the parts from which they are constructed.

5 Rigorous proofs are sometimes called ‘informal’, but this is a misleading description since, as wehave seen, they can be formalized.

A. Bundy and others2384

Phil. Trans. R. Soc. A (2005)

(v) Decide whether schematic proof or a Hilbertian proof provides a betterexplanation of these justifications or whether neither is a good fit andthere is a third possibility.6 It may be that subjects’ preferences varyaccording to their reasoning style. We may want to apply some pre-teststo classify the subjects’ reasoning styles.

There has been some previous work on building computational models for howstudents recognize patterns in numbers and construct functions to generate thenumbers (Haverty etal. 2000).Thisworkcouldprovideguidanceonourexperimentaldesign, for instance, asking the students to speak aloud during the experiment, thenanalysing the resulting protocols for clues about internal processing.

We also plan to conduct an investigation of some historical proofs, especiallythose that were later found to be faulty, to see if schematic proof can offer anexplanation of the mechanism used and help explain why errors were made thatwere hard to detect or correct.

6. Discussion

In this section, we discuss some related work.

(a ) Comparison to type theory

There are superficial similarities between the proposals here, to use theconstructive u-rule to produce schematic proofs and the formal representation ofproofs in constructive type theory (Martin-Lof 1984). Both, for instance,associate, with each theorem, an object that can be interpreted as both aprogram and a proof. However, there are also significant differences.

Figure 7. A schematic proof of the rotate-length theorem. ! Os , lens and rots refer to rewritingusing the step cases of the definitions of these functions. Similarly, !Ob , lenb and rotb refer torewriting using the base cases. Note that the number of times these rules are applied depends on n,the length of the list. However, only rewriting with the recursive definitions is required.

6Another potential candidate is a mechanism we have investigated for reasoning with ellipsis(Bundy & Richardson 1999).

2385What is a proof?

Phil. Trans. R. Soc. A (2005)

Firstly, the object associated with a theorem in type theory can be interpretedboth as a program constructed by the proof and as a verification proof of thecorrectness of that program and of the theorem. Moreover, the type theory proofproves the whole theorem. On the other hand, the program associated with atheorem by the constructive u-rule is not a general proof of the theorem; it is aprogram which will generate a putative proof for each instance of the theorem.Note that members of this family of proofs can contain different numbers ofsteps.

Secondly, the program in type theory is generated by the process of provingthe theorem. Its correctness is guaranteed by the soundness of type theory. Theprogram in the constructive u-rule is generated by inductive generalization froma few example proofs of these theorem instances. As we have shown, itscorrectness is not guaranteed; an additional meta-level proof is required toestablish this. This meta-level proof has no counterpart in type theory.

(b ) Rigorous proof as Hilbertian proof highlights

Many accounts of rigorous proof implicitly or explicitly adopt the position thata rigorous proof is essentially a Hilbertian proof, but with steps missing; possibly,90% or more of the steps. This is probably a valid account of many human-constructed proofs. We must then ask how errors may arise in such proofs andhow these errors may lie undetected.

It is not the case that mathematicians first produce the Hilbertian proofs andthen summarize them for publication by eliding 90%C of the steps. Firstly, thisexplanation is at variance with accounts of proof discovery. Secondly, the fewattempts to turn rigorous proofs into Hilbertian proofs often reveal errors. Forinstance, Jacques Fleuriot’s formalization of Newton’s proofs of Kepler’s Laws,using the Isabelle prover and non-standard analysis (Fleuriot 2001), revealed anerror in Newton’s manipulation of infinitesimal numbers. Even Hilbert himselfwas not immune. Laura Meikle’s Isabelle formalization of Hilbert’s Grundlagen(Meikle & Fleuriot 2003), revealed that Hilbert had appealed to the semantics ofthe geometry domain rather than just the axioms and rules of the formal theory.Thirdly, rigorous proofs often prove unreasonably hard to check. We haveargued that checking Hilbertian proofs should be routine: even Hilbertianproofs with many missing steps. At the time of writing, putative proofs of boththe Poincare Conjecture and the Kepler Conjecture are undergoing extensivechecking.

Human accounts of proof discovery suggest that mathematicians first form aplan, which they then unpack until they are satisfied of the truth of each proofstep. This process of proof discovery can also be automated using the techniqueof proof planning (Bundy 1991). However, in our automation of proof planning,the plan is unpacked into a Hilbertian proof. Humans stop short of this level ofunpacking. It would be interesting to investigate how they decide when to stop.A possibility worth investigation is that schematic proof is used at the leaves ofthe proof plan, i.e. that the proof plan is unpacked until the remaining subgoalscan be checked against a few well-chosen examples. This would explain howerrors could be introduced into the proof plan. It also unites our two rivalaccounts of rigorous proof.

A. Bundy and others2386

Phil. Trans. R. Soc. A (2005)

7. Conclusion

The standard Hilbertian account of mathematical proof fails to model somehistorically important proofs, to account for the possibility of undetected anduncorrected error and to account for the relative difficulty of proofs. Schematicproof provides an alternative account of proof that does address these issues.Schematic proofs are based on the constructive u-rule, which provides a formal,logic-based foundation. This, for instance, enables us to automate theconstruction and application of schematic proofs. Schematic proof provides alink to computer program verification, in which an invariant formula, analogousto VKECF, is shown to be preserved by successive computational operations.Just like Cauchy, programmers who do not verify their programs run the riskthat their systems will fail on unforeseen inputs. The process of formingschematic proofs by generalizing from examples provides a key role for examplesin the construction of proofs. This may go some way to explain why humans findmodels, diagrams, etc. so valuable during proof discovery.

We are now planning to conduct some psychological investigations into theextent to which schematic proofs can account for the mechanisms of human proofdiscovery.

The research reported in this paper was supported by EPSRC grants GR/S01771 and GR/S31099(Bundy), an EPSRC Advanced Research Fellowship GR/R76783 (Jamnik) and a Swedish InstituteGuest Scholarship and EPSRC/MRC Neuroinformatics Studentship EP/C51291X/1 (Fugard).We are grateful to Jurgen Zimmer for help with (Hilbert 1930) and to Torkel Franzen fordiscussions about the u rule.

References

Baker, S. 1993 Aspects of the constructive omega rule within automated deduction. Ph.D. thesis,University of Edinburgh, UK.

Bundy, A. 1991 A science of reasoning. In Computational logic: essays in honor of Alan Robinson(ed. J.-L. Lassez & G. Plotkin), pp. 178–198. Cambridge, MA: MIT Press.

Bundy, A. 2001 The automation of proof by mathematical induction. In Handbook of automatedreasoning (ed. A. Robinson & A. Voronkov), vol. 1, pp. 845–911. Amsterdam: Elsevier.

Bundy, A. & Richardson, J. 1999 Proofs about lists using ellipsis. In Proc. 6th Int. Conf. on Logicfor Programming and Automated Reasoning, LPAR, number 1705 (ed. H. Ganzinger,D. McAllester & A. Voronkov). Lecture Notes in Artificial Intelligence, pp. 1–12. Berlin:Springer.

Fleuriot, J. 2001 A combination of geometry theorem proving and nonstandard analysis, withapplication to Newton’s Principia. Distinguished dissertations. Berlin: Springer.

Godel, K. 1931 Uber formal unentscheidbare satze der Principia Mathematica und verwandtersysteme i. Monatsh. Math. Phys. 38, 173–198. (doi:10.1007/BF01700692.) (English translationin Heijenoort 1967).

Haverty, L. A., Koedinger, K. R., Klahr, D. & Alibali, M. W. 2000 Solving inductive reasoningproblems in mathematics: not-so-trivial pursuit. Cogn. Sci. 24, 249–298. (doi:10.1016/S0364-0213(00)00019-7.)

Heijenoort, J. V. 1967 From Frege to Godel: a source book in Mathematical Logic, 1879–1931.Harvard, MA: Harvard University Press.

Hilbert, D. 1930 Die Grundlebung der elementahren Zahlenlehre. Mathematische Annalen 104,485–494. (doi:10.1007/BF01457953.)

Hilbert, D. 1931 Beweis des Tertium non datur—Nachrichten von der Gesellschaft derWissenschaften zu Gottingen. Mathematisch–Physikalische Klasse, pp. 120–125.

2387What is a proof?

Phil. Trans. R. Soc. A (2005)

Jamnik, M. 2001 Mathematical reasoning with diagrams: from intuition to automation. Stanford,CA: CSLI Press.

Lakatos, I. 1976 Proofs and refutations: the logic of mathematical discovery. Cambridge, UK:Cambridge University Press.

Martin-Lof, Per. 1984 Intuitionistic type theory. Studies in proof theory. Lecture Notes, vol. 1.Naples: Biblopolis.

Meikle, L. I. & Fleuriot, J. D. 2003 Formalizing Hilbert’s Grundlagen in Isabelle/Isar Theorem.In Proving in Higher Order Logics: 16th International Conference, TPHOLs 2003. SpringerLecture Notes in Computer Science, vol. 2758, pp. 319–334. Berlin: Springer.

Nelsen, R. 1993 Proofs without words: exercises in visual thinking. Washington, DC: MathematicalAssociation of America.

Discussion

S. COLTON (Department of Computing, Imperial College London, UK). In the(admittedly few) times I have undertaken research mathematics, half of the time,I have progressed in the style of Fermat, Goldbach and Euler, i.e. I have noticeda pattern, then proved that the pattern is no coincidence. However, half the time,the theorem statement emerged at the same time that I finished the proof,because when I started, I did not know exactly what I wanted to prove. I worrythat in asking the question ‘What is proof?’, this emphasizes a distinctionbetween theorem and proof, whereas often, they should not be differentiated.I was wondering what your thoughts were about this.

A. BUNDY. I strongly agree that the development of the theorem and thedevelopment of the proof often proceed in parallel. Indeed, in axiomatic theories,the development of the theory and the definition of the concepts also sometimesproceeds in parallel with proof and theorem development. In automatic theoremproving this mutability of definitions, theorem and theory is usually neglected:the theory and theorem are taken as given and fixed. I have always beeninterested in the potential for interaction between axioms, definitions, theoremsand proofs and hope to investigate this further in my future research. Forinstance, I previously conducted joint research with Raul Monroy on theautomatic detection and correction of false conjectures, using a failed proofattempt to guide the correction process. Currently, I have a project with FionaMcNeill that uses failures in proof to change the underlying representation of thetheory. I have a project proposal to build a proof management system that willassist users to manage a large corpus of proofs, including adapting the proofs toaccomodate changes in axioms and definitions.

E. B. DAVIES (Department of Mathematics, King’s College London, UK). In someareas of mathematics almost the entire interest consists in finding appropriategeneralizations of simple examples, guided by what is possible and what seemsnatural, beautiful and simple. Different mathematicians have frequently been ledto quite different generalizations of the same theorem. Checking the correctnessof the proofs is not the interesting part of the subject. Do you have anycomments?

A. BUNDY. More generally, I think that mathematical reasoning consists of theinteraction between a heterogeneous collection of processes: theory development,conjecture making, counter-example finding, formalization of informal problems,

A. Bundy and others2388

Phil. Trans. R. Soc. A (2005)

calculation, as well as proof discovery. Among these processess, proof discoveryhas received a disproportionate amount of attention from the automatedreasoning community. In my group’s research, we have tried to investigate awider range of mathematical processes. For instance, Simon Colton has built theHR system for theory development, including the formation of concepts andconjectures by generalization from examples.

T. ANDERSON (CSR, School of Computing Science, University of Newcastle, UK).Must we consign mathematics to the dustbin until computers have confirmed thevalidity of the theorems and proofs?

A. BUNDY. What an extraordinary idea! Firstly, Lipton has argued in thismeeting that the correctness of a proof is confirmed by a social process ofinteraction between mathematicians. We might want to integrate our computersystems into this social process so that they played a complementary role to thehuman mathematicians, but that requires solving very hard problems about theaccessibility of and interaction with computer proof. Secondly, automatedtheorem proving is not yet capable of dealing with most state-of-the-artmathematics, but needs augmenting with human guidance, which calls for skillsthat are in short supply. Thirdly, even if the propensity of humans to err meanswe cannot be 100% confident of the correctness of a proof, a human-generatedproof could still form the starting point for a computer verification—indeed,most interative proof case studies have taken this form. Fourthly, humanmathematicians find mathematics fun. Why would they stop having fun? I thinkwe need to find niches where automated mathematical reasoning systems canplay a useful complementary role to human mathematicians, e.g. in checkingvery large or complicated, but elementary proofs, in computer-aided teachingtools, in paper writing and refereeing aids, etc. We can also use automatedreasoning as a modelling mechanism to try to understand human reasoning.

N. SHAH (British Society of History of Mathematics, Durham, UK). ProfessorBundy mentioned that Newton’s Principia had a flaw. The flaw was addressed(as I understand it), by Berkeley and the axiom of the first and last ratio.

A. BUNDY. Bishop Berkeley addressed some general worries about the consistencyof reasoning with infinitesimals. As a result of such criticisms, the use ofinfinitesimals in calculus was replaced by the epsilon/delta arguments of realanalysis. Interestingly, work by the logician Abraham Robinson and others in the1960s provided a consistent theory of infinitesimal and infinite numbers, callednon-standard analysis, within which the arguments about infinitesimals in thePrincipia and elsewhere can be formulated without the risk of paradoxes. JacquesFleuriot’s mechanization of parts of the Principia, as mentioned in my paper, wasbased on Robinson’s non-standard analysis. The particular flaw Fleuriotdiscovered in Newton’s ‘proof’ of Kepler’s laws of planetary motion, was notfound by Berkeley, nor, to the best of our knowledge, by anyone else in the 318years since the Principia was published, even though Newton himself was awareof this kind of flaw (dividing both sides of an equation by an infinitesmal). Thisillustrates how the nit-picking detail required in computer proofs can helpuncover problems that a human’s more cursory reading will skim over withoutnoticing.

2389What is a proof?

Phil. Trans. R. Soc. A (2005)

N. SHAH. Historically, mathematicians are interested in the intention behind themaths, and so long as their proof submitted can be repaired if wrong,mathematicians are not going to be convinced by proof reasoning people thatthe i’s dotted/t’s crossed are the most important thing.

A. BUNDY. One of the most surprising lessons I have learnt from this meeting isthe tolerance of mathematicians for minor errors in proofs. The clearest exampleof this was Michael Aschbacher’s estimate of the probablity of error in theclassification of finite groups as pZ1. His conclusion was not to reject the proof,but to assume that any error could be readily repaired, if necessary by adding afew more categories in the classification. I interpret this stance as evidence of thehierarchical nature of proofs: if the high-level structure is right then any error inthe low-level detail can probably be repaired. My research group has exploitedthis hierarchical structure in our work on proof plans: a hierarchical way ofdescribing computer proofs, which can be used to guide the automated search fora proof and to automatically repair failed proof attempts. Despite this tolerancefor minor error, I am guessing that mathematicians would still welcome a toolthat could detect, and maybe even repair, such errors.

M.ATIYAH (Department ofMathematics&Statistics,University ofEdinburgh,UK).I think I have to defend Newton! Any error in his work on planetary orbits mustsurely have been of a minor kind, easily rectifiable. His results have been rederivedby modern methods innumerable times and experimentally confirmed, so it is notsignificant if such a minor lapse actually went undetected until recent times.

A. BUNDY. This remark surely provides further evidence of the point made bythe last questioner and in my response to it. Indeed, Newton’s error was of aminor kind. It was rectified by Fleuriot, who was able to replace the faulty stepwith a correct sub-proof. My Principia example was not meant to be a criticismof Newton. Rather, I was using it to illustrate both how hard it can be to detectminor errors, even in one of the world’s oldest and most well-knownmathematics books, and how a computer proof can help us detect and repairsuch errors.

A more serious example arises from the more recent work of Fleuriot and hisstudent Laura Meikle (a student helper at this meeting). They automatedHilbert’s Grundlagen, which was an attempt to formalize Euclidean Geometrywithout the use of geometric intuition. But the detailed computer reconstruc-tion showed that Hilbert had used geometric intuition, although again themissing formal steps could be provided. However, although the errors wereminor, their presence was not, since it undermined the raison d’etre of Hilbert’swork.

R. POLLACK (School of Informatics, University of Edinburgh, UK). Questionersobjected that the error in Newton’s Principia was minor, and after all, thetheorem was correct, so nothing much was gained by Fleuriot discovering andfixing the error. But at the same time, questioners insisted on the importance ofthe deep understanding captured in the proof: this is trying to have your cakeand eat it at the same time.

A. BUNDY. It all depends on what you mean by ‘deep’. You could argue thatthe deep understanding arises from the high-level structure of the proof, which

A. Bundy and others2390

Phil. Trans. R. Soc. A (2005)

captures the essential intuition behind the proof, rather than the detailed low-level proof steps within which the error was found. On the other hand, manyof the paradoxes that arose from the informal use of infinitesimals arose fromjust such faulty low-level proof steps. Any ‘deep understanding’ of the cause ofthese paradoxes would require the investigation of exactly these low-levelsteps.

2391What is a proof?

Phil. Trans. R. Soc. A (2005)

Panellist position statement: some industrialexperience with program verification

BY RODERICK CHAPMAN

Praxis High Integrity Systems([email protected])

As the only obvious ‘industrial’ member of the panel, I would like to introduce myselfand the work I am involved with. Praxis is a practising software engineering companythat is well known for applying so-called ‘Formal Methods’ in the development of high-integrity software system. We are also responsible for the SPARK programming languageand verification tools (John Barnes with Praxis High Integrity Systems 2003). SPARK

remains one of the very few technologies to offer a sound verification system for anindustrially usable imperative programming language. Despite the popular belief that ‘noone does formal methods’, we (and our customers) regularly employ strong verificationtechniques on industrial-scale software systems.

I would like to address three main points:

1. What can and do we prove about programs?

‘Proof’ of computer programs is often seen as an arduous ‘all or nothing’ exerciseonly advocated by particularly sadistic university professors. Not so. SPARK offersa range of analyses that verify simple program properties (such as freedom fromaliasing and data-flow errors) up to full verification of partial correctness withrespect to some suitable specification. In the middle of this spectrum we haveverification of ‘no run-time errors’ (such as division-by-zero or the ubiquitous‘buffer overflow’) and the verification of ‘interesting’ safety and securityinvariants. A particular project will often mix and match these levels ofverification, where they are needed most, depending on safety and/or securityrequirements.

These analyses scale up reasonably well. Proof of the absence of run-timeerrors has been performed on programs of the order of 100 000 lines of code.Partial correctness proofs have been developed for significant portions of 20 000line programs (King et al. 2000). These are still one or two orders of magnitudebelow what is needed for ‘Grand Challenge’ size programs, but at least we havesome starting point.

Completeness of the proof system remains an important issue. For run-timeerror proof, the theorem prover automatically discharges about 95% ofverification conditions (VCs) for well-written programs. This also gives us auseful quality metric—if the theorem prover does not hit 95%, then your

Phil. Trans. R. Soc. A (2005) 363, 2393–2394

doi:10.1098/rsta.2005.1652

Published online 6 September 2005

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof’.

2393 q 2005 The Royal Society

program is probably too complex or badly structured and should be re-written.Improving this completeness remains a fundamental research goal.

2. Development process and engineering behaviour

When engineers are first exposed to the SPARK and its proof technology, theytypically find it hard going. SPARK requires (indeed forces) a clarity and precisionof expression that most programmers are unaccustomed to. Secondly, theavailability of proof alters the normal software development process. Inparticular, code is not manually reviewed or tested until it has passed a definedlevel of automatic analysis. For example, a project might require over 95% ofVCs automatically discharged prior to code review. This (eventually) changesthe engineers’ behaviour—producing elegant, provable code becomes the norm.This investment in time ‘up front’ pays off later owing to the lower defect ratethat results. SPARK has sometimes been described as ‘talent normalization’facility—poor programmers must pull their socks up, while the ‘gurus’ are forcedto reign in their dark magical skills and produce code that someone else (or amachine) can actually understand.

Secondly, trying to produce code so rigorously often discovers defects inspecifications and requirements. You can ‘get stuck’ with SPARK—finding that aprogram is impossible to implement as specified or finding VCs that cannot bedischarged. Subtle ambiguities and contradictions in specifications andrequirements can be discovered this way. Finding such defects prior to anydynamic analysis (e.g. testing) also generates a significant saving of time andeffort.

3. Social acceptance

To apply strong software verification technology, suppliers need to convince manystakeholders of its usefulness. Customers, government agencies, regulators,engineers, educators, the public and many more need to be convinced that suchtechnology is cost-effective, scalable and produces a better product. Legalprecedents for the liability of software producers and even the existence of anybest-practice at all in software engineering remain elusive. In a few industries (e.g.UK military aerospace) some success has been reported—most SPARK users carryout useful proof of non-trivial programs and the various stakeholders seemconvinced of its worth. In other domains,many hearts andminds are still to bewon.

References

High Integrity Software: The SPARK approach to safety and security. (2003). MA, USA: Addison-Wesley. See also www.sparkada.com.

King, S., Hammond, J., Chapman, R. & Pryor, A. 2000 Is proof more cost effective than testing?IEEE Trans. Software Eng. 26, 675–686. (doi:10.1109/32.879807.)

2394 R. Chapman

Phil. Trans. R. Soc. A (2005)

Panelist position statement: reasoning aboutthe design of programs

BY CLIFF B. JONES

University of Newcastle upon Tyne([email protected])

I have long been involved in using formal notation to explain computer systemsand to record our understanding. My views are, therefore, more concerned withthe extent to which what one does when one reasons about software can becompared with normal mathematics than whether or not software theoremprovers can help mathematicians.

All scientists and engineers build models which capture essential abstractionsof complex systems; at different times we might focus on different facets of asystem. Not only does one seek brevity and abstraction, one also seeks a tractablenotation which facilitates reasoning about—or calculation of properties of—thesubject system. It is often the case that rather deeper results are required tojustify the use of a particular reasoning style.

I became involved in what is often termed ‘formal methods’ (for computing)when it became clear that programming languages were becoming too complex tohandle via informal methods. (Working on final testing of a major compiler forPL/I in IBM convinced me that quality cannot be achieved by any post-hoctechnique—even though we designed automatic test tools which were ahead oftheir time.) The major benefit of writing a formal description of a computersystem or programming language is that it helps simplify the design or‘architecture’; messy interactions of features are spotted long before effort toimplement them uncovers the problems (which if detected late are likely to bepatched in even more messy ways).

My view of the role of proof (or as I want to propose, ‘rigorous argument’) issimilar to this description of the usefulness of abstract specifications. In all butthe most trivial cases, whenever I have been faced with a challenge to prove anextant program satisfies its specification, I have failed! What I have been able todo is to start again with a formal specification and systematically develop a newprogram. The new program might use concepts from the one I had tried tounderstand; it might also embody new ideas, which were prompted by theabstractions. Comparison between the original program and the redeveloped onewill often uncover errors in the former while the latter is complete with a designrationale which can help others understand it.

My position is that any process, that starts only after a sloppy design phase, isdoomed. This is true if that post facto process is testing, model checking or evenproving. It is the ‘scrap and rework’ involved in removing errors which wasteseffort in software development. Formalism pays off when it can be used to detect

Phil. Trans. R. Soc. A (2005) 363, 2395–2396

doi:10.1098/rsta.2005.1653

Published online 6 September 2005

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof’.

2395 q 2005 The Royal Society

flaws before further work is based on such mistakes. Formal developmentmethods for computer systems like VDM (Jones 1990) or B (Abrial 1996) use a‘posit and prove’ style which fits well with engineering practice; an individualdesign decision is made and justified before further work is based on it.

Numerous examples could be listed where steps of data reification or operationdecomposition provide real insight into the design of a program. (One of my owncurrent research goals is to devise a method of ‘atomicity refinement’.)

Having been one of the first to use the term ‘rigorous argument’ in connectionwith this sort of development, I would like to say why I think it closely resemblesthe sort of outline proofs praised by mathematicians. One can characterize‘rigour’ as being capable of formalization. In program design, a step of datareification might be justified by recording a ‘retrieve function’; if doubt arises, insay a review of a design step, the author can be pressed to add more detail; in theextreme, one might push this to a fully formal proof. Unlike the position of myco-panelist’s original paper, I see this as being rather like the social process inmathematics.

There is of course a place for mechanical proof checkers or automatic theoremprovers if they can be made sufficiently usable. Jean-Raymond Abrial—forexample—has a tool which will automatically discharges the vast majority of‘proof obligations’ in a careful development using B. But it takes good(mathematical) taste to break the development in just the right way to achievethis. I would also add that changes are the norm in software and automatic toolsare useful in tracing their impact.

Nor in my opinion does the above exhaust the similarities to mathematicalproof. The late Ole-Johan Dahl made the point that program proving wouldnever take off unless we built up bodies of theorems about our basic tools(abstract data objects and frequently used control constructs): this is like thetask of developing an interesting mathematical theory. Beyond that, there is thejustification of the methods themselves: initial forms of data reification proofsrelied on a homomorphism from representation to abstraction; there wereinteresting cases involving non-determinism where VDM’s rules were notcomplete; finding and justifying complete rules was an important meta result.

Finally, I should like to add a word about ‘separation of concerns’. It isobvious that a proof about a program in a (Baroque) programming language isdubious. But a clearly recorded semantics for a safe subset of such a language canoffer a way of dividing a huge task into two more tractable steps: the semanticsshould be used as a basis for the compiler design/verification and used to provideassumptions when reasoning about programs in the language.

References

Abrial, J.-R. 1996 The B-Book: Assigning programs to meanings. Cambridege University Press.Jones, C. B. 1990 Systematic software development using VDM. Prentice Hall International.

C. B. Jones2396

Phil. Trans. R. Soc. A (2005)

Panelist position statement: logic and modelsin computer science

BY URSULA MARTIN

Queen Mary University of London([email protected])

Modern computing products are among the most complex engineering artefactsso far created. For example, Microsoft’s Windows operating system has around100 million lines of code, predicted to grow by 33% a year. Intel’s Itanium 2processor has around 400 million transistors, with Moore’s law predicting adoubling in that number every 18 months. Ensuring the correctness of such largeand complex systems demands increasing resource—for example verification isclaimed to take 50% of the cost of chip design, rising to 70% in some companies.The damage to users and the companies themselves, caused by widely publicisedflaws such as the Pentium division bug or Internet Explorer security loopholes,means companies like Intel and Microsoft are devoting increased attention toverification both in research and product divisions.

These techniques build on years of earlier research into appropriatemathematical theories for modelling processes and devices (Jones 2003). Pioneerslike von Neumann, Goldstine and Turing in the 1940s understood thatcomputing could be viewed as a branch of logic; in the 1960s Hoare and Floyddeveloped logics for reasoning about assertions and programs; in the 1970s Scottand Strachey developed the rigorous notions of semantics that allowed us tounderstand what a program did independently of the particular machinearchitectures it was running on and Milner laid the foundations of the theories weneed to study distributed and interacting processes. The influence of this work isseen in the design of modern programming languages like Java and the takeup oflogic-based verification techniques: today, articles in trade magazines like EETimes routinely mention ‘assertions’ or ‘model checking’, which until a few yearsago were the preserve of academic specialists.

Effective machine support is essential if these ideas are to be applied inpractice (MacKenzie 2001). Milner and Gordon were among the first to developthe theorem proving tools that made it practical to apply these theoretical ideasto obtain correctness proofs in domains, where calculation or verification by handwould be totally infeasible. These tools build up formal proofs from axioms andrules and are particularly valuable when large numbers of cases of rather similarresults need to be verified, as useful tactics for semi-automation may be devisedeven if the theory is undecidable. Clarke devised model checking, which providedcounterexamples when such correctness proofs failed and has proved particularlyuseful for systems modelled as finite-state automata.

Phil. Trans. R. Soc. A (2005) 363, 2397–2399

doi:10.1098/rsta.2005.1654

Published online 6 September 2005

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof’.

2397 q 2005 The Royal Society

Current applications of theorem proving are far from the dreams of the earlypioneers, of running fully verified software on fully verified hardware: thecompanies involved are making pragmatic business decisions about how best toincorporate useful techniques into their existing well-developed design infra-structure. For example at Microsoft Ball and others (Ball et al. 2004) havedeveloped theorem provers and model checkers to verify device drivers—thetricky code that makes peripherals, like monitors or printers, work correctly(usually). Windows contains thousands of drivers—in their tests of 30 propertiesof the Microsoft Windows Parallel Port device driver, the prover was called487 716 times: on average, these queries contained 19 unique atoms and 40instances of Boolean operators per query and in the worst case, one querycontained 658 unique atoms and another contained 96 691 instances of theBoolean operators. At Intel, Harrison (Aagaard & Harrison 2000) used theoremproving in a version of Gordon’s HOL system to verify the floating point divisionof the IA-64 against the IEEE standard. The IA-64 refines the hardware’sdivision in software and his work necessitated developing theories of realarithmetic and analysis in HOL, in order to verify large numbers of special casesaccording to edge effects on registers and the like; this in turn involved treatingvarious Diophantine equations that determined where the bad cases lay. In bothcases, once the underlying mathematical theories are established and theinfrastructure set up in the theorem prover, it may be used over and over againto derive results about particular cases which are important for the matter athand but unlikely to be of great mathematical significance.

This context provides a way of addressing questions of different kindsmathematical approach. As an analogy let us consider applying the theory ofdynamical systems to determine stability of a fighter aircraft:

(i) The theory of dynamical systems is well-developed, building on thefoundations of Newton and Liebniz to provide precise definitions andtheorems, for example conditions for solutions to exist, to be stable and soon. Even so, this development took several hundred years with surprisesalong the way, for example Poincare’s discovery of chaotic phenomena.

(ii) To investigate a particular system, for example to establish that it isstable, we use practical engineering knowledge to determine anappropriate model. For established disciplines like aircraft design ourchoice of model may be codified as fairly rigorous protocols and standards,based on the theories of dynamical systems and control engineering, whichare well-established as a lingua franca in the community.

(iii) To establish properties of the model we run tests on it, for examplecomputing eigenvalues of a linear system to verify stability, most likelywith machine assistance in a program such as MatLab to ensure accurateanswers. The results of these tests are unlikely to be of general importancein the theory—they act as routine calculations.

If this process produces an unsatisfactory answer it may arise from problemsat any of the three stages: incorrect computation of the eigenvalues, poor choiceof a model or possibly, though unlikely, from a hitherto unnoticed flaw in theunderlying theory of differential equations.

U. Martin2398

Phil. Trans. R. Soc. A (2005)

The development of verification described above follows this pattern inoutline, but by contrast we have a great variety of mathematical models andverification tools under consideration, much less of a consensus about how orwhere to apply them or about a unified approach to education and standards.

It seems to me that this should be viewed as a remarkable opportunity, ratherthan a weakness. The dream of some of the early workers in mechanised proofwas that this would transform mathematics. It has not yet done so and maybenever will, but it has made possible the machine verification of routine yetunwieldy mathematical results that are needed to model computationalphenomena (Martin 1999). By the standards of computer science, thedevelopment of classical mathematics is a slow process, yet it has a humblingdepth and richness which we can aspire to in developing our present theories andthe software tools that support them, so that they can bring to computationalsystems the power that calculus has brought to physics and engineering. By thestandards of mathematics, modern computer science may seen to have identifieda ridiculous variety of theories that may sometimes appear rather shallow, yetit has enormous challenges to meet in modelling, understanding and predictingnext generation networks and devices. There is plenty of exciting work aheadfor everyone.

References

Aagaard, M. & Harrision, J. (eds) 2000 Theorem proving in higher order logics, 13th internationalconference, TPHOLs 2000, Portland, Oregon, USA, August 14–18, 2000, Proceedings. Lecturenotes in computer science, vol. 1869. Berlin: Springer.

Ball, T., Cook, B., Levin, V. & Rajamani, S. K. 2004 SLAM and static driver verifier: technologytransfer of formal methods inside Microsoft. In Integrated formal methods, 4th internationalconference, IFM 2004, canterbury, UK, April 4–7, 2004, Proceedings (ed. E. A. Boiten,J. Derrick & G. Smith) Lecture notes in computer science, vol. 2999, pp. 1–20. Berlin: Springer.

Jones, C. B. 2003 The early search for tractable ways of reasoning about programs. IEEE, Ann.Hist. Comput. 25, 26–49. (doi:10.1109/MAHC.2003.1203057.)

MacKenzie, D. 2001 Mechanizing proof: computing, risk, and trust. Cambridge, Mass.: MIT Press.Martin, U. 1999 Computers, reasoning and mathematical practice. In Computational logic.

Proceedings of the NATO Advanced Study Institute on Computational Logic, Marktoberdorf,Germany, 29 July–10 August 1997. NATO ASI Series, Computer and systems sciences, vol. 165,pp. 301–346. Berlin: Springer.

2399Panelist position statement

Phil. Trans. R. Soc. A (2005)

Highly complex proofs and implicationsof such proofs

BY MICHAEL ASCHBACHER

Department of Mathematics, California Institute of Technology,Pasadena, CA 91125, USA

([email protected])

Conventional wisdom says the ideal proof should be short, simple, and elegant. Howeverthere are now examples of very long, complicated proofs, and as mathematics continuesto mature, more examples are likely to appear. Such proofs raise various issues. Forexample it is impossible to write out a very long and complicated argument withouterror, so is such a ‘proof ’ really a proof ? What conditions make complex proofsnecessary, possible, and of interest? Is the mathematics involved in dealing withinformation rich problems qualitatively different from more traditional mathematics?

Keywords: complex; proof; simple group; classification

Conventional wisdom says the ideal mathematical proof should be short, simpleand elegant. However, there are now examples of very long, complicated proofs,and as mathematics continues to mature, more examples are likely to appear.

I have some experience with one such effort: the Classification of the finitesimple groups. I’m going to use the Classification theorem and its proof as a basisfor discussion, but I’m not going to state the theorem or go into details about theproof. Rather I’ll treat the Classification and its proof as a black box, in that I’llbegin by listing some features of the theorem and its proof, and later use them tohelp illustrate some of the points I hope to make.

First, the proof of the Classification is very long and complicated. As a guess,the proof involves perhaps 10 000 pages in hundreds of papers, written byhundreds of mathematicians. It would be difficult to establish exactly whichpapers are actually a necessary part of the proof, and I know of no publishedoutline. At least this last difficulty will be eliminated by a program in progress,whose aim is to carefully write down in one place a complete and somewhatsimplified version of most of the proof. Still there has not been as muchimprovement and simplification of the original proof as one might expect.

Second, the theorem is very useful. One cannot do serious finite group theorywithout the Classification, and it has made possible numerous applications offinite group theory in other branches of mathematics. One can speculate that aproof of the complexity of the Classification would be unlikely to evolve in theabsence of such strong incentives. One can also speculate that such theorems canonly be proved via some kind of evolutionary process: the extent of the problem

Phil. Trans. R. Soc. A (2005) 363, 2401–2406

doi:10.1098/rsta.2005.1655

Published online 9 September 2005

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof’.

2401 q 2005 The Royal Society

and possible paths to a solution only become visible after a large amount ofpreliminary investigation and experimentation.

Third, at first glance the Classification is a prototypical classification theorem:It considers a class C of objects (in this case the class of finite simple groups),supplies a list L of objects in the class and proves that each member of C isisomorphic to exactly one member of L.

But also fourth, the collection L of examples is large, varied, and of greatinterest, and each member has a rich structure. The Classification does morethan just show L and C are equal; its proof supplies a wealth of detailedinformation about the structure of members of L. Such information is aprerequisite for applying the Classification. Thus after a bit more thought, theClassification is more than just a ‘classification theorem’.

Fifth, the proof is inductive and depends upon a good knowledge of thestructure of members of L. That is to say one considers a minimal counterexample to the Classification: an object G of minimal order subject to G in C butnot in L. Then all proper simple ‘sections’ of G in C are in L, and most argumentsare based on strong information about such sections, available in the inductivecontext.

As an aside, it is worth noting that there exists no theorem which says: Eachsufficiently large member of C is in L. If we’ve made mistakes, so that thetheorem is false and there is some H in CKL, then it might be possible to repairthe theorem by adding H to L and making minor modifications to the inductive‘proof ’. This would be true if the structure of H is much like that of the membersof L. But if H has a very different structure, one could imagine that such amodification might not be possible.

Now I’d like to draw some implications from the example. I began with theobservation that the ideal proof is short, simple, and elegant. The proof in ourexample has none of these desirable qualities. That hasn’t stopped mathema-ticians from appealing to the theorem, but it does raise various questions.

First, because of the complexity of the proof and the absence of a definitivetreatment in the literature, one can ask if the theorem has really been proved.After all, the probability of an error in the proof is one. Indeed, presumably anyattempt to write down a proof of such a theorem must contain many mistakes.Human beings are not capable of writing up a 10 000 page argument which isentirely free of errors. Thus if we demand that our proofs be error free, then theClassification can’t be proved via techniques currently available.

However in practice, mathematicians seem only to take this idealized notion of aproof as a model toward which to strive. The real standard would appear to be anargument which deals carefully with all fundamental difficulties, and whichorganizes and does due diligence to the small details, so that there are few gaps orminor errors, and those that exist can be filled in or repairedwithoutmuch difficultyby the reader. I suspect most professional mathematicians feel that, after some(high) minimal standard of rigor has been met, it is more important that the proofconvey understanding than that all formal details appear without error.

This suggests we should consider a bit more carefully the role ‘proof ’ plays inmathematics. At Caltech, pure mathematics is part of the ‘Division of Physics,Mathematics, and Astronomy’. This gives me a little insight into the differencebetween how mathematicians and physicists view the notion of ‘proof’. For thephysicist, the truth of a theory or hypothesis is established by testing it against

M. Aschbacher2402

Phil. Trans. R. Soc. A (2005)

physical data. My sense is that most physicists feel proofs are nice, but not allthat important.

On the other hand, for the mathematician, truth is established via proofs,since that portion of a particular mathematical universe visible via ‘experiment’may be too small to be representative of the total universe. But the process ofproducing a proof does more: It leads to a deeper understanding of themathematical universe the mathematician is considering.

Moreover, proofs and fields of mathematics evolve over time. The first proof ofa theorem is usually relatively complicated and unpleasant. But if the result issufficiently important, new approaches replace or refine the original proof,usually by embedding it in a more sophisticated conceptual context, until thetheorem eventually comes to be viewed as an obvious corollary of a largertheoretical construct. Thus proofs are a means for establishing what is real andwhat is not, but also a vehicle for arriving at a deeper understanding ofmathematical reality.

By concensus of the community of group theorists, the Classification has beenaccepted as a theorem for roughly 25 years, despite the fact that, for at least partof that period, gaps in the proof were known to exist. At this point in time,all known gaps have been filled. The most significant of these (involving theso-called ‘quasithin groups’) was only recently removed in the lengthy twovolume work of Aschbacher and Smith. During the 25 years, the proof of theClassification has not evolved as much as one might expect. Some simplificationsand conceptual improvements to certain parts of the argument have emerged,and there is a program in progress to write down the proof more carefully in oneplace. Dependence on computer aided proofs for the existence and uniqueness ofthe so-called sporadic groups has been almost entirely eliminated. But for themost part the proof still has the same shape and complexity.

To set the stage for one explanation of these facts, and to further explore whythe proof of the Classification (and by extension other proofs) should be socomplicated, I present a quote from the biologist John Hopfield talking about thecore curriculum at Caltech:

Physics was .often presented as the paradigm for how science should be done. The

idea was that a science should require as little actual knowledge as possible, and that

all conclusions should follow from a very small set of facts and equations. Biology

is an information-rich subject. Complex structures and behaviors are intrinsic to

(and the essence of ) biology and other information rich sciences.

John Hopfield

I believe the Classification is an example of mathematics coming to grips witha complex information rich problem using both Hopfield’s physics paradigm andhis biology paradigm. The hypothesis of the theorem is simple and easilyunderstood by anyone who has taken a decent undergraduate course in abstractalgebra. The conclusion also appears at first glance to be at least moderatelysimple. However, when one looks more closely, one finds that it takes some effortand sophistication to define many of the examples. Moreover, the utility of thetheorem stems from two facts: First. it seems to be possible to reduce mostquestions about finite groups to questions about simple groups. Second, theexplicit description of the groups on the list L supplied by very effective

2403Highly complex proofs and implications

Phil. Trans. R. Soc. A (2005)

representations of most of the groups, make it possible to obtain a vast amount ofdetailed information about the groups.

Fact one makes it possible to avoid the untenable complexity and relative lackof structure of the general finite group. The reduction from the general finitegroup to the finite simple group corresponds to a reduction from a universe withrelatively little structure and much complexity (such as the universe of biology)to a universe with a lot of structure and manageable complexity. But for thosewho use the theorem, those changes are hidden in the proof.

However consumers must still grapple with the complexity inherent in thesimple groups themselves. This is where fact two comes in. More and more inmodern mathematics, particularly in problems in discrete mathematics comingfrom fields like information theory, computer science, or biology, one must dealwith objects with little classical mathematical structure, but under hypothesesplacing strong constraints on the objects which are difficult to exploit in theabsence of structure. Many such problems can be translated into the domain ofgroup theory, where suitable information about simple groups can be used toobtain a solution.

Further, I speculate that the Classification is itself an early example of thiskind of result. A priori it is difficult to make use of the hypothesis that a group issimple: the assumption does not automatically supply a nice representation ofthe group. The variety of examples in L suggest this must be true. Instead, onemust exploit detailed information about the members of L in the inductivesetting of the minimal counter example, operating more in the paradigm ofbiology than in the paradigm of physics or classical mathematics. It is my sensethat there is an overabundance of information in the problem, which makespossible many different proofs, depending on how one utilizes the information.Producing a good proof in such a situation may be less a result of a clever idea ora new, better point of view, than of optimal organization of a very large set ofdata, and good technique.

My guess is that we will begin to encounter many more such problems,theorems, and proofs in the near future. As a result we will need to re-examinewhat constitutes a proof, and what constitutes a good proof. Elegance andsimplicity should remain important criteria in judging mathematics, but theapplicability and consequences of a result are also important, and sometimesthese criteria conflict. I believe that some fundamental theorems do not admitsimple elegant treatments, and the proofs of such theorems may of necessity belong and complicated. Our standards of rigor and beauty must be sufficientlybroad and realistic to allow us to accept and appreciate such results and theirproofs. As mathematicians we will inevitably use such theorems when it isnecessary in the practice our trade; our philosophy and aesthetics should reflectthis reality.

This work was partially supported by NSF-0203417.

Discussion

P. H. A. SNEATH (Infection, Immunity and Inflammation, University of Leicester,UK ). In biology one must often make a large number of assumptions before one

M. Aschbacher2404

Phil. Trans. R. Soc. A (2005)

can formulate a theorem, and then the proof may be very simple. The question iswhether it is really a proof. To give an example from bacteriology, how does oneidentify a strain of the typhoid bacillus, Salmonella typhi, and prove the identity?In principle one collects many strains that have certainly come from cases oftyphoid fever, and determines numerous properties of these accurately. One thensets up a model in which the species S. typhi can be likened to a swarm of bees ina multidimensional space. An unknown strain is identified as S. typhi if it lieswithin the swarm. But after making these and other assumptions (including thatthe variation is haphazard,—effectively ramdom,—and that the swarm isperhaps distributed multivariate normally—but not multivariate logistically)the proof is simple. One can obtain the probability that the unknown bacillus is atyphoid bacillus from the well-known properties of the normal distribution.Further, the results are robust; a few mistakes do not greatly damage theconclusions. But it is evident the prior assumptions are the critical factor,because one can scarcely check the identity by infecting a volunteer.

M. ASCHBACHER. Sneath gives an example where a biological process is modeledby a mathematical system. As I interpret it, he then asks: Is a proof of a theoremin the mathematical system, also a ‘proof’ of a ‘theorem’ about biology? It wouldseem to me that the notions of ‘theorem’ and ‘proof’ (at least as understood bymathematicians) are particular to mathematics. As Sneath suggests, theinformation the mathematical theorem gives about the biological problem, isonly as good as the fit of the mathematical model to the original problem. Even ifthe fit is good, it is not clear to me that translations of theorems and proofs in themathematical setting to the biological setting can be called ‘theorems’ and‘proofs’ without straining the meaning of those words to the breaking point. Onthe other hand theorems in the mathematical setting do give biologicalinformation when the model is good.

A. BUNDY (School of Informatics, University of Edinburgh, UK ). How can weaccount for the unreasonable robustness of proofs? Naively, we might expectmost errors in proofs to be fatal, but many are readily fixed. Why is this?

M. ASCHBACHER Some proofs are robust and others are not. I thinkmathematicians operate on at least two levels: the formal and the intuitive.Consider an area of mathematics where formal machinery is in place, which hasbeen worked out fairly carefully and in detail, and in addition the intuition of thecommunity of specialists in the area is in tune with that machinery. In such asituation, theorems advanced by capable members of the community are usuallyunlikely to have large, irreparable errors, or at least it is unlikely that such errorswill not be discovered by the community. The intuition of the community (andthe individual) will normally lead them to those places in the proof where seriouserrors are likely to occur. In such a situation the individual mathematicianusually finds serious errors in his or her proof, before the proof sees the light ofday, and the community identifies flawed mathematics before it gains wideacceptance. On the other hand, problems can arise when untested, unfamiliarmachinery is applied, or when the community encounters a situation wherecounter intuitive phenomena are involved.

M. ATIYAH (School of Mathematics & Statistics, University of Edinburgh, UK ).An analogy has been made between evolutionary biology, in which complex

2405Highly complex proofs and implications

Phil. Trans. R. Soc. A (2005)

organisms emerged as a result of long random processes and natural selection,and complex mathematical problems such as the classification of finite simplegroups; I think this is not a correct analogy. Finite simple groups did not emergefrom some random choice of axiom systems, they were a product of the humanmind, though reflecting the notion of symmetry in the natural world.

M. ASCHBACHER I think the analogy I’d draw is between the evolution ofbiological organisms and certain proofs. At some level, both the complexorganism and the complex proof are examples of complex adaptive systems.True, proofs do not emerge entirely randomly. But for a long period of time, eachindividual mathematician working on his or her small part of the problem,almost certainly has no serious global strategy. As a group, the community’sapproach will be influenced by mathematical precedents, but in time new ideaswill emerge which alter the accepted paradigms. Subgroups will concentrate onsubproblems, and develop highly specialized mathematics to deal with theirsubproblem. Eventually enough structure emerges from the union of thesespecialties to suggest a global strategy. Finally a proof is achieved, but not aproof anyone could have foreseen when the process began. Moreover if differentpeople had been involved, or if the same people had looked at things a bitdifferently, then a totally different proof might have resulted.

A. IRELAND (Department of Computer Sciences, Heriot-Watt University, UK ). Toa large extent computer science is concerned with the systematic management oflarge and complex evolving artefacts (systems). Yesterday we heard fromcomputer scientists and artificial intelligence practitioners on computer assistedreasoning. As a working mathematician, were there any ideas presented yesterdaythat you feel may assist you in managing the complexity of your evolving proofs?

M. ASCHBACHER I suspect that for the most part, one can’t do much to managecomplex proofs. In the case of the classification of the finite simple groups, at afairly late stage in the game (about 1970), Danny Gorenstein began to speculateon a global strategy for a proof. In effect he called attention to certainsubproblems, which appeared to be approachable, or almost approachable, andhe put forward a somewhat vague vision of how to attack some of thesubproblems, and how his various modules might be assembled into a proof.While his program was sometimes a bit far from what eventually emerged, inother instances he was fairly prescient. In any event, Gorenstein focusedattention on the problem of classifying the finite simple groups, in the processmaking the effort more visible. He also gave it some structure and served as aclearing house for what had been done, and was being done. In short, Gorensteinmanaged the community of finite simple group theorists, and to a lesser extent,managed part of the development of the proof itself. But he was only ableto accomplish even these limited goals at a fairly late stage in the game: The last10 years of an effort which was reasonably intense for about 25 years, and in somesense went on for almost a century. That is to say, a long period of learning andexperimentation was necessary before it was possible to develop a global idea ofhow a proof might proceed. Finally, these observations are really only about thesociology of the community searching for the proof, rather than about strategiesand techniques for dealing with complex mathematics, beyond the obviousapproach of partitioning a big problem into smaller pieces.

M. Aschbacher2406

Phil. Trans. R. Soc. A (2005)

Skolem and pessimism about proofin mathematics

BY PAUL J. COHEN

Department of Mathematics, Stanford University, Bldg. 380,450 Serra Mall, Stanford, CA 94305-2125, USA

([email protected])

Attitudes towards formalization and proof have gone through large swings during thelast 150 years. We sketch the development from Frege’s first formalization, to thedebates over intuitionism and other schools, through Hilbert’s program and the decisiveblow of the Godel Incompleteness Theorem. A critical role is played by the Skolem–Lowenheim Theorem, which showed that no first-order axiom system can characterize aunique infinite model. Skolem himself regarded this as a body blow to the belief thatmathematics can be reliably founded only on formal axiomatic systems. In a remarkablyprescient paper, he even sketches the possibility of interesting new models for set theoryitself, something later realized by the method of forcing. This is in contrast to Hilbert’sbelief that mathematics could resolve all its questions. We discuss the role of new axiomsfor set theory, questions in set theory itself, and their relevance for number theory. Wethen look in detail at what the methods of the predicate calculus, i.e. mathematicalreasoning, really entail. The conclusion is that there is no reasonable basis for Hilbert’sassumption. The vast majority of questions even in elementary number theory, ofreasonable complexity, are beyond the reach of any such reasoning. Of course this cannotbe proved and we present only plausibility arguments. The great success of mathematicscomes from considering ‘natural problems’, those which are related to previous work andoffer a good chance of being solved. The great glories of human reasoning, beginning withthe Greek discovery of geometry, are in no way diminished by this pessimistic view. Weend by wishing good health to present-day mathematics and the mathematics of manycenturies to come.

Keywords: proof; predicate; calculus; axiom system; model; Skolem paradox

1. Introduction

I should like to thank the organizers of the conference for inviting me to expressmy ideas on the nature of mathematical proof. What I have to say may besomewhat anachronistic, in that I shall review a debate that raged almost acentury ago, but which has been quiescent lately. Nevertheless, in light of whathas occurred, I believe that one can come to some reasonable conclusions aboutthe current state of mathematical proof. Most of the references to the older

Phil. Trans. R. Soc. A (2005) 363, 2407–2418

doi:10.1098/rsta.2005.1661

Published online 12 September 2005

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof ’.

2407 q 2005 The Royal Society

literature are to be found in the excellent collection ‘From Frege to Godel’,edited by Jean van Heijenoort (1971).

The title of my talk alludes to both the work of Thoralf Skolem, and, perhapseven more, to the conclusions he came to at a rather early stage of thedevelopment of mathematical logic. The work is, of course, the famousLowenheim–Skolem Theorem, for which Skolem gave a simplified proof, andwhich is undoubtedly the most basic result about general axiomatic systems. Itcan be given various formulations, but the form which Skolem himself attributesto Lowenheim is that ‘every first order expression is either contradictory orsatisfiable in a denumerably infinite domain’ (Skolem 1970). As Skolem showed,there is a natural extension to the case of countably many such expressions.‘Contradictory’ here is defined by reference to the rules of the predicate calculus,i.e. normal mathematical reasoning. The startling conclusion that Skolem drew isthe famous Skolem Paradox, that any of the usual axiom systems for set theorywill have countable models, unless they are contradictory. Since I will not assumethat my audience are all trained logicians, I point out that though the set of realsfrom the countable model is countable seen from outside, there is no function‘living in the model’ which puts it in one-to-one correspondence with the set ofintegers of the model. This fact and other considerations led Skolem to thisviewpoint:

I believed that it was so clear that axiomatization in terms of sets was not asatisfactory ultimate foundation of mathematics, that mathematicians would,for the most part, not be very much concerned by it.

The view that I shall present differs somewhat from this, and is in a sense moreradical, namely that it is unreasonable to expect that any reasoning of the typewe call rigorous mathematics can hope to resolve all but the tiniest fraction ofpossible mathematical questions.

The theorem of Lowenheim–Skolem was the first truly important discoveryabout formal systems in general, and it remains probably the most basic. It is nota negative result at all, but plays an important role in many situations. Forexample, in Godel’s proof of the consistency of the Continuum Hypothesis, thefact that the hypothesis holds in the universe of constructible sets is essentiallyan application of the theorem. In Skolem’s presentation of the basic theorem, itreads like a plausible, natural theorem in mathematics, unencumbered by thejargon prevalent both in many papers of the time, and, above all, in thecontemporary philosophical debates concerning the foundations of mathematics.As the reader can verify by referring to van Heijenoort’s reference book, all ofSkolem’s writings on logic and set theory have a clarity and simplicity which isstriking. Even now it is truly rewarding to read these papers and reflect on them.

Now, no discussion of proof can fail to refer to the Incompleteness Theorem ofGodel. The result states that no reasonable system of mathematics can prove itsown consistency, where the latter is stated as a theorem about proofs in its ownformal system, and hence can be construed as a result in combinatorics ornumber theory. The Incompleteness Theorem is a theorem of mathematics, andnot a philosophical statement. Thus, in this sense, it is unassailable, but, inanother sense, since it refers to such a specific question, it is not really relevant tothe question which I am addressing in this talk, namely the extent to whichproblems in mathematics can reasonably be expected to be settled by

P. J. Cohen2408

Phil. Trans. R. Soc. A (2005)

mathematical reasoning. It is, of course, the first, and perhaps the only, provedstatement supporting the basic pessimism of Skolem’s viewpoint.

Let me begin by recalling some facts concerning the development of theaxiomatic method, which I am sure are familiar to all of you. With thepublication of Frege’s epic work ‘Begriffschrift’ in 1879, the notion of a formalsystem was given a definitive form. Important related work was done by Boole,and Pierce, and later Peano presented a similar approach, but with Frege’s work,for the first time in the history of human thought, the notion of logical deductionwas given a completely precise formulation. Frege’s work not only included adescription of the language (which we might nowadays call the ‘machinelanguage’), but also a description of the rules for manipulating this language,which is nowadays known as predicate calculus. Now the Greeks had introducedthe axiomatic method, and Leibnitz had speculated about a universal deductivemechanism. Thus, as with many great discoveries, the precise formulation, ofwhat is meant by a formal system, grew gradually in the collective unconscious,and so perhaps did not appear to many people at the time as a breakthrough.Certainly no radically new ideas were introduced, nor any particularly difficultproblems overcome. But this was a major landmark. For the first time one couldspeak precisely about proofs and axiomatic systems. The work was largelyduplicated by others, e.g. Russell and Whitehead, who gave their ownformulations and notations, and even Hilbert made several attempts toreformulate the basic notion of a formal system. The variety of such attemptsrelates to the problem of clearly distinguishing between the axioms which areassumed as the starting point of a theory and the methods of deduction which areto be used. The Godel Completeness Theorem, which many people regard asimplicit in Skolem’s work, explicitly shows that there is no ambiguity in the rulesof deduction. This is in marked contrast to the Incompleteness Theorem, whichshows that no reasonable axiom system can be complete.

Alongside these developments, there raged a lively debate, continuing almostto the onset of World War 2, about the ultimate validity of mathematics. Thisdebate saw the emergence of formalism, logicism and intuitionism as competitorsfor the correct foundation of mathematics. I will briefly discuss these competingphilosophies, noting at the outset that each seems to focus on proofs rather thanmodels. In this respect Skolem’s ideas were in sharp contrast to those of most ofhis contemporaries. I believe that today the situation is rather the reverse, due inpart to my own work, showing how many models of set theory can be constructedusing the notion of forcing (Cohen 1966). Indeed, Skolem even foresaw, in his1922 paper, the construction of new models of set theory, for there he states:

‘It would in any case be of much greater interest if one could prove that a newsubset of Z could be adjoined without giving rise to contradictions; but thiswould probably be very difficult.’ As I said, his interest in models was perhapsahead of his time, so let me discuss now some of the common viewpoints onfoundations.

First, I would mention the belief of Hilbert that the beautiful structure ofmathematics, erected in the course of centuries, was in some sense sacrosanct,not to be challenged. Indeed, he felt that mathematical knowledge was ourbirthright, and that in principle human reasoning could decide all mathematicalquestions. He felt it necessary to defend, at all costs, mathematics from the

2409Skolem and pessimism about proof

Phil. Trans. R. Soc. A (2005)

attacks of such as Kronecker and Brouwer. In his 1904 article he summarizes theviewpoints of Kronecker, Helmholtz, Christoffel, Frege, Dedekind and Cantor,finding deficiencies in their viewpoints, and offering his own treatment as analternative. I am not very impressed by his efforts in this paper, but greatlyadmire the tenacity with which he defends the inviolability of mathematicalreasoning. Perhaps he himself realized the difficulties of giving any completelysatisfactory foundation, and so retreated, if I may use the expression, to a moremodest position, that at least if we regard mathematics as a formal game playedwith symbols we should be able to show that the game is consistent, This becameknown as the Hilbert Program, and though many attempts were made not toomuch was accomplished, the reasons for which became clear when Godel provedhis Incompleteness Theorem. The Program survived in some form, under thename of Proof Theory, and we shall later refer to Gentzen’s outstanding result inthat discipline. Hilbert’s goal was informally outlined, since what was meant by aconsistency proof was not entirely explicit. In his basic belief that beyond anydoubt mathematics was referring to an existing reality, and that it must be madesecure from all philosophical attacks, he undoubtedly enjoyed the support of thevast majority of mathematicians.

Second, there arose a school that questioned methods of proof involving whatmay be called non-constructive reasoning. Foremost proponents were Brouwerand Weyl, both very distinguished mathematicians. The objections strike at theuse of the classical predicate calculus, rejecting for example the use of ExcludedMiddle and related non-constructive proofs of existence. The school ofIntuitionism probably never obtained much support among working mathema-ticians, but it has repeatedly resurfaced in various forms, for example in the workof Errett Bishop on constructive analysis. In some forms, the school may evenreject the use of formal systems entirely, on the grounds that they are irrelevantfor mathematical reasoning.

A recurring concern has been whether set theory, which speaks of infinite sets,refers to an existing reality, and if so how does one ‘know’ which axioms toaccept. It is here that the greatest disparity of opinion exists (and the greatestpossibility of using different consistent axiom systems).

2. Questions concerning the predicate calculus

The formulation, by Frege and others, of mathematics as a formal system, mustcertainly be regarded as a milestone in the history of human thought. In a way itis a most curious achievement, in that it merely codified what was generallyknown. However, as a completed structure, reducing mathematical thought towhat we today would call a machine language, and thereby eliminating anyvagueness, it was a historic step. Perhaps Frege and the early workers did notcompletely separate the formalization of logical thinking and the rules of logicaldeduction. Today we clearly do so, and these rules are known as the predicatecalculus. Concerning the predicate calculus itself, there is no controversy, thoughthe intuitionists and others would restrict its use. The work of Lowenheim andSkolem, and the Completeness Theorem of Godel, indeed show that one has aninvariant, natural notion. Let me state these results now.

P. J. Cohen2410

Phil. Trans. R. Soc. A (2005)

First, I review the formulation of the language. One has symbols for relations(of various arities) between objects. We have the logical connectives, thequantifiers, and some helpful symbols such as parentheses, commas andsubscripts, and finally the symbols for individual variables and constants. Therules for manipulation of the connectives are sometimes called the Boolean orpropositional calculus. Much more powerful, in the sense that they contain thecrux of mathematical reasoning, are the quantifiers. These are the existentialquantifier (‘there exists’) and the universal quantifier (‘for all’). The rules ofpropositional calculus are elementary and well known. The key step inmathematical thinking is that if a statement asserts that there exists an x suchthat a certain property A(x) holds then we invent a name for such an object, andcall it a constant, and can then form sentences with it.

Conversely, if a universal statement asserts that A(x) holds for all x then wecan deduce A(c) for all constants. For example, if we have a constant positivereal number a, and we know square roots exist for general positive reals, then weinvent the symbol b for a square root of A.

Viewed this way, the rules become extremely transparent, if one takes care toavoid clash of constants and the like. The fundamental discovery of Lowenheim–Skolem, which is undoubtedly the greatest discovery in pure logic, is that theinvention (or introduction) of ‘constants’ as in predicate calculus, is equivalent tothe construction of a ‘model’ for which the statements hold. More precisely, if theuse of predicate calculus does not lead to a contradiction on the basis of a set S ofsentences, then repeated use of the rules will result in a model for the system S.Moreover, the method ensures that we get a countable model if S is countable.And thus we get to the Skolem ‘Paradox’ that if a first-order system of axioms isconsistent then it has a countable model, because all current systems of settheory have countably many primitives.

As an aside, I remark that the work received amazingly little attention. IndeedSkolem remarks that he communicated these results to mathematicians inGottingen, and was surprised that, despite this revealed ‘deficiency’ in theaxiomatic method, there still existed, in his opinion, an unwarranted faith thatthe axiomatic method can capture the notion of mathematical truth. This is thepessimism to which I refer in the title. Later I shall refer to an even deeperpessimism, which has found little expression in the literature.

Skolem wrote in a beautiful, intuitive style, totally precise, yet more in thespirit of the rest of mathematics, unlike the fantastically pedantic style of Russelland Whitehead. Thus, Hilbert even posed as a problem the very result thatSkolem had proved, and even Godel, in his thesis where he proved what is knownas the Completeness Theorem, does not seem to have appreciated what Skolemhad done, although in a footnote he does acknowledge that ‘an analogousprocedure was used by Skolem’. A possible explanation lies in the fact thatSkolem emphasized models, and was amazingly prescient in some of his remarksconcerning independence proofs in set theory. A discussion of the priorityquestion can be found in the notes to Godel’s Collected Works (Godel 1986).Godel was undoubtedly sincere in his belief that his proof was in some sense new,and in view of his monumental contributions I in no way wish to find fault withhis account. What is interesting is how the more philosophical orientation oflogicians of the time, even the great Hilbert, distorted their view of the field andits results. When Godel showed, in his Incompleteness Theorem, that the Hilbert

2411Skolem and pessimism about proof

Phil. Trans. R. Soc. A (2005)

Program was doomed, Hilbert (as far as I can find out from the records) did noteven invite him to present his results in Gottingen. Godel did not have apermanent position, and it was only due to the perspicacity of Americanmathematicians, who understood the significance of his work, that he waseventually appointed to the Institute for Advanced Study at Princeton.

So what are the disputes involving the rules of logic, given that theCompleteness Theorem seems to say that they account for all correct reasoningin first-order logic? I will not attempt to categorize the various schools in thisdispute, nor their philosophical principles. But I think that one can safely saythat the differences involve the notion of constructivity, and the restriction toexistence proofs based on constructive reasoning. Many people devoted theirefforts to developing various parts of mathematics in a constructive manner.I think that for many the crucial issue is already present in the most basic part ofmathematics, number theory. Since classical set theory is non-constructivealmost by definition, in that it speaks of infinite sets, one hardly expectsconstructive ideas to be successful here. (Of course Godel, in his epoch-makingproof of the consistency of the Continuum Hypothesis and the Axiom of Choice,does use a notion of ‘constructibility’, but this is in an extended sense involvingreference to ordinals, and thus is entirely natural within set theory.)

In number theory, most results are constructively obtained, even if it mayrequire some work to see this. Let me give what I believe to be the first exampleof a truly non-constructive proof in number theory, so that the reader, if not alogician, will be exposed to some of the subtleties involved. This is the famoustheorem of Skolem’s compatriot, Thue, extended by Siegel, and in a sensedefinitively completed by Roth. It says that an algebraic number can have onlyfinitely many ‘good’ approximations by rational numbers. There is no need tospecify the meaning of ‘good’ here, the basic idea being that the error in theapproximation should be less than a certain function of the denominator of theapproximating rational. The theorem has as a consequence that certainpolynomial equations in two variables have only finitely many integral solutions.

Now, all the classical proofs are totally ‘elementary’ (though ingenious), andare constructive except in the very last lines of the proof. Thue showed that therecould not be two approximations p/q and p 0/q 0, where both q and q 0 are greaterthan a number c (constructively given), and q 0 greater than a constructivelygiven power of q. Now he draws the conclusion that there can be only finitelymany good approximations, since if p/q is given there is a bound for all otherapproximations p 0/q 0. This is a perfectly correct deduction, but if one does notknow one solution one is in no position to bound the others. This is a mostdifficult problem, and, though Baker’s work has yielded constructive estimates insome cases, one seems far from constructive bounds in general. Since the time ofThue, other examples have been found, though perhaps no more than a dozen. Ofcourse one has no proof that constructive bounds do not exist. Even if one isuncertain about the exact limits of the notion, one can, and does, ask whetherthere are general recursive bounds, or better primitive recursive ones.

Since I do not share the intuitionist ideology, or any of its variants, I will notraise the objections that they would raise, but clearly every mathematician mustfeel a certain unease about the above proof. It is simply desirable to have a moreconstructive proof.

P. J. Cohen2412

Phil. Trans. R. Soc. A (2005)

There are people who are more extreme, and who claim that any inductiveproof (such as the above) based on predicates with too many quantifier changes(so that no instance is immediately verifiable) should not be allowed. The mostextreme view, held by at least one mathematician at a respectable university, isthat eventually a contradiction will be found even in elementary number theory.

Let me say briefly why I cannot accept such limitations on the use of thepredicate calculus. The reason lies in the very procedures of the predicatecalculus, because in a sense every statement is proved by contradiction. The formof the proof may vary, but, in essence, the Completeness Theorem says that if aset of statements does not lead to a contradiction it is satisfiable. So, to show thatsomething is valid, i.e. that it is necessarily satisfied, one must show that theassumption of its negation leads to a contradiction.

Since I shall refer to this procedure again later, let me emphasize in slightlymore detail what the rules are. Using elementary rules one can bring everystatement into prenex form. Something of prenex form will be of one of the forms‘for all x, A(x)’ or ‘there exists x such that A(x)’, where A itself may have otherquantifiers, and constants which have been introduced before. In the case of ‘forall x, A(x)’ one can add to the list from which one is trying to deduce acontradiction all ‘A(c)’. In the case of ‘there exists x such that A(x)’ one addscorrespondingly ‘A(c)’ for a new constant. If there is a contradiction derivablefrom our original assumption, then it will be revealed after finitely manyapplications of these rules of procedure, and at that point the contradiction willbe obtainable by propositional calculus, as all the prenex quantifiers will havebeen stripped off. More specifically, as Skolem points out explicitly, we look at allthe original undefined relations, and substitutions got by using the constantsintroduced at the various stages, and we will eventually be unable to assignconsistently truth-values to the quantifier-free formulas produced by ourprocedure. Conversely, and this is only slightly harder to see, if we can alwaysfind truth assignments that work, we are in effect constructing a model of theoriginal set of sentences. There are technical details involving revisitingrequirements over and over, but these are not difficult. I refer the reader toSkolem’s original paper for an intuitive explanation.

Now it is clear to me that if a contradiction is obtained the original statementmust be ‘false’. Of course the intuitionist might argue that this is not goodenough, that one wants more than a proof of contradiction from classical logic. Ican only reply that in the usual, everyday mathematics, as practiced by the vastmajority of mathematicians, all proofs proceed by contradiction. This may besurprising at first sight, but thinking about the above sketch of the CompletenessTheorem will show that this is exactly what is done in all proofs. In my finalcomment, where I shall present a ‘pessimistic’ view, it is important that oneunderstands the method allowed by the predicate calculus.

3. Consistency questions

During the period of the great debate, between 1910 and 1920, there emerged theFormalist School associated with Hilbert. My impression is that Hilbert sharedthe viewpoint of ‘naive’ mathematicians, that is, that existing mathematics, withits notion of proof, corresponded to a real world. And yet, in a sense formalism

2413Skolem and pessimism about proof

Phil. Trans. R. Soc. A (2005)

asserts the opposite. Hilbert wished to secure mathematics from the attacks ofintuitionists and others, and therefore proposed as a minimal program to provethat formalized mathematics was consistent. No doubt this appeared at the timeto be a reasonable goal, and one could even have hoped that the consistencyproof might be done within elementary combinatorial mathematics (from thispoint of view mathematics could be construed as a combinatorial game). Anaccompanying idea was more daring, namely that such a combinatorial analysismight even result in a decision procedure, i.e. a method of deciding whether agiven statement could be proved or not, or, even more ambitiously, for decidingthe truth value of the statement in question.

This hope was of course shattered by the Godel Incompleteness Theorem,which asserts that no reasonably complex system can prove its own consistency,unless it is inconsistent, in which case everything is provable and the system isuseless. My main thesis here, which I shall discuss at the end of my lecture, isthat the premise of the Hilbert program is more profoundly untrue. I claim thatmathematics can prove only an incredibly small proportion of all truestatements. But for now I discuss some technical issues in Proof Theory.

The proof of Incompleteness can be formulated in different, essentiallyequivalent, ways. In particular, it is closely related to the notion of recursive orcomputable function, and motivated the large subject of recursive functiontheory, so that one cannot regard Godel’s result as purely negative.

A technical subject, Proof Theory, arose, with one of its goals to understandthe fine detail of unprovability of consistency. For a given theory, one seeks acombinatorial principle which is natural and allows one to prove consistency.The first, and still most striking, results are those of Gentzen (1969), whoanalysed the consistency strength of elementary number theory (first-orderPeano arithmetic). Since elementary number theory would seem to be need inany kind of combinatorial analysis, it may seem silly to use number theory toprove number theory is consistent. However, Gentzen’s elegant work is notcircular, and can be formulated so as to yield precise information about proofs inelementary number theory. Let me sketch the idea of his proof, in my ownversion which I intend to publish some day.

Let us consider (in number theory) a proof P of a contradiction. In ourdiscussion of the rules of deduction, we said that there are various possibilities,all equivalent. Now we must make matters precise. It is most natural to regardthe proof as a division of cases. This means that, in various stages of the proof,we consider a division into A and not A, and regard the proof as a tree, such thatstarting from the top of the tree, quoting the axioms of number theory, andallowing for the division into branches, we arrive at a situation where, allowingfor invention and substitution of constants as described, we have a contradictionin every branch, among the statements involving constants alone. We also allowBoolean manipulations in the usual way. Thus a proof of a contradiction becomesa tree, with a contradiction in every branch. Now, the branch structure isimportant, because of the structure of the axioms of number theory. The keyaxiom is the Axiom of Induction. Really this is a countable set of axioms, withone instance for each property A(n) involving only one free variable n. Such aninstance states that one of three possibilities holds:either

A(0) is false

P. J. Cohen2414

Phil. Trans. R. Soc. A (2005)

orfor some n, A(n) is true and A(nC1) is false

orA(n) is true for all n.

Clearly this branching is an essential feature of induction.The idea behind Gentzen’s proof is to go from P to another proof P 0 of

contradiction, with a simpler tree structure.How to simplify the proof ? Well, in any induction branching as above, the

easiest branch to investigate is the third, since it says that something is true forall n and does not assert the existence of any particular constant. Briefly, we godown the tree and wait till we encounter a particular integer, say 5, where A(5)occurs. But then induction up to 5 is obvious and can be replaced by five cases ofthe induction hypothesis. This has to be done carefully. However, one sees that inat least one branch no constants are created, except particular numerals such as5 or 7. In this way the use of the induction axiom can be eliminated in at leastone case.

Now, assuming that this reduction from P to P 0 is defined, the question iswhether the new proof of a contradiction is simpler. The set of all finite trees canbe ordered in a simple manner, namely, starting from the first node of a tree, wecompare two trees by comparing the branches of the trees, assuming byinduction that tree whose depth is one less have already been ordered. We usethe usual lexicographic ordering. Now, if we define things correctly, we can showthat indeed the order of the tree goes down each time we eliminate a single use ofinduction. This ordering is a well-ordering, and it corresponds to the ordinal e0,which can also be defined as the limit of un as n goes to u, where u1 is u, andunC1 is uun . From Godel’s Theorem it follows that either we cannot formulatethis kind of induction in the system, or we can, but we cannot prove it. The latteris the case, and in this way we reach a plausible combinatorial principle just outof reach of elementary number theory, and one from which one can prove theconsistency of elementary number theory in an elementary way. Proof Theoryhas gone on to seek analogous principles for more complex systems, e.g.fragments of set theory.

4. Set theory, the ultimate frontier

At about the same time as Frege was developing the first universal formalsystem, Cantor was developing the foundations of mathematics as based on settheory. More precisely, it can be said that Cantor realized that set theory wasa legitimate area of study, perhaps not realizing that it was the basis of allmathematics. In any event, Frege made an attempt to axiomatize a universal‘set theory’, and made a mistake by allowing the existence of the set of allsets, thereby getting a contradiction. One normally attributes to Zermelo thefirst axiomatization of set theory, in more or less the form that we considertoday. However, the system was still vaguely defined, and again it was Skolemwho pointed out the deficiencies (Fraenkel did so too, in a less precise way).This gives the system now known as Zermelo–Fraenkel set theory.

The development of set theory has been largely separate from that of therest of mathematics, except perhaps for considerations around the Axiom of

2415Skolem and pessimism about proof

Phil. Trans. R. Soc. A (2005)

Choice. Nevertheless, mathematicians have as a rule regarded the problemsof set theory as legitimate mathematical questions. The ContinuumHypothesis, despite the independence results, remains an object ofspeculation for set theorists.

It is in set theory that we encounter the greatest diversity of foundationalopinions. This is because even the most devoted advocates of the various newaxioms would not argue that these axioms are justified by any basic ‘intuition’about sets. Let me give some examples of the scope of such axioms.

One may vary the rank of sets allowed. Conventional mathematics rarelyneeds to consider more than four or five iterations of the power set axiom appliedto the set of integers. More iterations diminish our sense of the reality of theobjects involved.

One can attempt to vary the properties allowed in the comprehension axiom,while dodging the Frege problem.

Axioms of infinity assert the existence of large cardinals whose existencecannot be proved in the Zermelo–Fraenkel system. The earliest example is thatof inaccessible cardinals, and more recently one has considered much largercardinals whose existence has remarkable consequences even for real analysis.These kinds of axioms can be extended indefinitely, it seems, and, despite theinterest of their consequences, the reality of the cardinals involved becomes moreand more dubious. The same can be said for more exotic axioms, of determinacytype, despite the remarkable connections now known between their consistencystrength and that of large cardinals.

So we come now to one of the most basic questions. Does set theory, once weget beyond the integers, refer to an existing reality, or must it be regarded, asformalists would regard it, as an interesting formal game? In this sense, we aregoing beyond the scope of the conference, which concerns proof. Rather we arequestioning the very sense of some things which are proved. I think that for mostmathematicians set theory is attractive, but lacking the basic impact ofarithmetic. There is almost a continuum of beliefs about the extended world ofset theory.

A typical argument for the objective reality of set theory is that it is obtainedby extrapolation from our intuitions of finite objects, and people see no reasonwhy this has less validity. Moreover, set theory has been studied for a long timewith no hint of a contradiction. It is suggested that this cannot be an accident,and thus set theory reflects an existing reality. In particular, the ContinuumHypothesis and related statements are true or false, and our task is to resolvethem.

A counter-argument is that the extrapolation has no basis in reality. Wecannot search through all possible sets of reals to decide the continuumhypothesis. We have no reason at all to believe that these sets exist. It is simplyan empirical fact that no contradiction has been found.

Clearly both points of view have their strengths and weaknesses. Throughthe years I have sided more firmly with the formalist position. This view istempered with a sense of reverence for all mathematics which has used settheory as a basis, and in no way do I attack the work which has been donein set theory. However, when axiom systems involving large cardinals ordeterminacy are used, I feel a loss of reality, even though the research isingenious and coherent. In particular, a strong defect of the first view, for

P. J. Cohen2416

Phil. Trans. R. Soc. A (2005)

me, is the idea that if mathematics refers to a reality then human thoughtshould resolve all mathematical questions. This leads me to my final section,on the ultimate pessimism.

5. The ultimate pessimism deriving from Skolem’s views

Skolem, in his papers, was so struck by the existence of non-isomorphic models ofall but the most trivial axiom systems that he was led to doubt the relevance ofany mathematical axiom system to the philosophical questions concerningfoundations of mathematics. For example, he pointed out the existence ofcountable models of set theory. He seems to have been the first clearly toemphasize models rather than methods of proof. Whether or not he believed inan absolute model of set theory, which was beyond all attempts to describe it byaxioms, is not clear to me. But certainly he was aware of the limitations on whatcould be proved. In a remarkable passage, he even discusses how new models ofset theory might be constructed by adding sets having special properties,although he says he has no idea how this might be done. This was exactly thestarting point of my own work on independence questions, although I was totallyunaware that Skolem had considered the same possibility. It always seemed tome that it was futile to adopt the proof-theoretic approach and analyse thestructure of proofs. Even if the formalist position is adopted, in actual thinkingabout mathematics one can have no intuition unless one assumes that modelsexist and that the structures are real.

So, let me say that I will ascribe to Skolem a view, not explicitly stated byhim, that there is a reality to mathematics, but axioms cannot describe it. Indeedone goes further and says that there is no reason to think that any axiom systemcan adequately describe it.

Where did the confidence, expressed so vividly by Hilbert, that all questionsmust be resolved, come from? One view that has struck me, ever since myearliest encounters with mathematics, originates with the Greeks, and Euclid inparticular. Here for the first time we see the power of the human intellect beingbrought to bear not only on mathematics, but also on physics and astronomy.What a fantastic thrill it must have been to live through this era and enjoy theescape from superstition and primitive beliefs, and the sudden bright lightdawning of the triumph of reason alone! We have all felt this thrill, encountering,at an early age, Euclid and the wonderful beauty and completeness of hisgeometric system. Just a hundred years ago even the Pythagoras Theorem wasregarded as a marvel of deductive reasoning, and books were publishedcontaining many proofs.

But let us recall Skolem’s theorem. How does one actually proceed in a proof ?After a finite stage one invents symbols for the objects that are known to existunder a certain assumption A. Also one makes finitely many substitutions of theconstants into universal statements, and repeats this in some dovetailingprocedure. Then one sees if there is a propositional contradiction in what is nowknown about those finitely many constants. For example, suppose one wish todisprove (and thereby prove the negation of) some statement about primes. Ifone is working in number theory, one will be able to divide into cases, accordingto the principle of induction outlined above. But, in essence, all one can do is run

2417Skolem and pessimism about proof

Phil. Trans. R. Soc. A (2005)

a check on finitely many integers derived from the hypothesis. With luck, wereach a contradiction, and thereby prove something. But suppose one asks anunnatural statement about primes, such as the twin primes question. Perhaps onthe basis of statistical considerations, we expect the primes to satisfy this law.But the primes seem rather random, and in order to prove that the statisticalhypothesis is true we have to find some logical law that implies it. Is not it verylikely that, simply as a random set of numbers, the primes do satisfy thehypothesis, but there is no logical law that implies this? Looked at from the pointof view of the Skolem construction, it would seem that we can run checks, butthey may be hopelessly weak in determining the truth.

Now, one can ask, how does the introduction of higher axioms of infinity(perhaps having analytic implications) affect whether the statement can beproved. Indeed, doesn’t the Godel Incompleteness Theorem show exactly thatthe consistency of a given system, which is a combinatorial, or number-theoretic,statement, gets resolved by passing to a higher infinity? Will not the use of moreand more complicated set-theoretic axioms resolve more and more arithmeticstatements?

My response is twofold. One, the above is a rather idealistic hope. The onlystatements of arithmetic, resolved by higher set theory, which are known today,are basically consistency statements or close relatives. In a sense the highersystems almost assume the principles we want proved. There is no intuition as towhy the consideration of the higher infinite should bring us closer to solvingquestions about primes. Secondly, how far can we go by extending set-theoreticaxioms? As said before, one rapidly gets removed from intuition, and we have noidea at the outset how to relate the axioms to primes.

Therefore, my conclusion is the following. I believe that the vast majority ofstatements about the integers are totally and permanently beyond proof in anyreasonable system. Here I am using proof in the sense that mathematicians usethat word. Can statistical evidence be regarded as proof ? I would like to have anopen mind, and say ‘Why not?’. If the first ten billion zeros of the zeta functionlie on the line whose real part is 1/2, what conclusion shall we draw? I feelincompetent even to speculate on how future generations will regard numericalevidence of this kind.

In this pessimistic spirit, I may conclude by asking if we are witnessing the endof the era of pure proof, begun so gloriously by the Greeks. I hope thatmathematics lives for a very long time, and that we do not reach that dead endfor many generations to come.

References

Cohen, P. J. 1966 Set theory and the continuum hypothesis. New York: Addison-Wesley.Gentzen, G. 1969 Collected papers of Gerhard Gentzen (ed. M. E. Szabo). Amsterdam: North-

Holland.Godel, K. 1986 Kurt Godel: collected works (ed. S. Feferman et al.), vol. 1. Oxford: Oxford

University Press.Skolem, Th. 1970 Selected works in logic by Th. Skolem (ed. J. E. Fenstak). Oslo: Scandinavian

University Books.van Heijenoort, J. (ed.) 1971 From Frege to Godel. Cambridge, MA: Harvard University Press.

P. J. Cohen2418

Phil. Trans. R. Soc. A (2005)

The mathematical significance of proof theory

BY ANGUS MACINTYRE

Queen Mary, University of London, London, UK([email protected])

Returning to old ideas of Kreisel, I discuss how the mathematics of proof theory, oftencombined with tricks of the trade, can occasionally be useful in extracting hiddeninformation from informal proofs in various areas of mathematics.

Keywords: proof; provability; mere truth; unwinding; fully formalized proof

1. Notions of proof, prior to proof theory

(a ) Proofs in mathematics

I take the basic subject matter of these Proceedings to be mathematical proofs,as presented in the style traditional for the last two hundred years, i.e. in booksor journals, in a variety of languages (usually natural languages with extraformalism specific to mathematics). Mathematical knowledge is communicatedand certified between specialists using such proofs. There are other less formalmethods of demonstration, in conversations or seminars, and there are surelysituations, say in low-dimensional topology, where one might use variousgestures to communicate the basic idea (and this would suffice for experts).

Proofs in this sense presuppose earlier proofs, and a proof is supported by acomplex scaffolding, some of it erected thousands of years ago. Progress in thesubject depends not only on the emergence of proofs of new results, or new proofsof old results, but also on artistry in the structuring of proofs. Proofs have to belearned, remembered at least in broad outline and later fruitfully combined withother proofs and ideas, to go further. Many remarks at the meeting had to dowith the issues of visualizability and memorability, and it was clear that theseare central issues for mathematicians, but much less so for computer scientistsright now.

(b ) Proofs of novel complexity

We are talking here of traditional, informal proofs. Several talks from thisDiscussion Meeting (notably those of Aschbacher (2005) and MacPherson) bearon specific contemporary proofs of novel complexity. At issue is the reliability ofsuch proofs. This is not the first time in the history of mathematics that suchconcerns have been prominent, but I do not think that we face anything thatdeserves to be called a crisis. It seems to me likely that we will see in the nearfuture other proofs of the kind described by McPherson, and that we will simply

Phil. Trans. R. Soc. A (2005) 363, 2419–2435

doi:10.1098/rsta.2005.1656

Published online 6 September 2005

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof’.

2419 q 2005 The Royal Society

become accustomed to them. Aschbacher predicts that we will see before longother classifications with the complexity of that for finite simple groups.

In some cases there is a component of computer assistance, without which onewould not have a proof at all. In such cases, generally, there is a trace ofdissatisfaction in the mathematical community at the presence of the large-scalecomputational component, and a hope that another proof will be found free ofsuch components. As McPherson reports, the mathematical editors of Annals ofMathematics were unable to convince themselves of the correctness of thecomputer component of Hales’ proof, and delegated the responsibility for checkingthat component to the computer science community. One hopes that this kind ofsituation will not become the norm. After all, in principle there seems no greatpurely mathematical difficulty in proving the correctness of the algorithms usedby Hales. Presumably the problem is rather that the algorithms are not presentedin a mathematically perspicuous way. Given a clear presentation, can thesecorrectness proofs be more than simple inductive proofs? It may be that theanxiety comes rather from fear of dependence on correctness of hardware. Thereare presumably quite general methods for establishing probabilities of failure ofdevices of this kind, not at all specific to computers assisting mathematicalactivity. For example, depending on one’s point of view, one may fear more suchfailure in a military or transportation situation than in a mathematical proof.Some of us may even fear a systematic failure in our own hardware.

Another situation, discussed at this meeting, concerns the classification theoryof finite simple groups, where it has not been unreasonable to doubt that therehas been a complete proof at all, or at least to be unsure as to when a proof hasfinally been obtained. Here the issue is not at all the presence of computationalassistance in the proof. One is dealing rather with a large-scale proof with manyhighly technical, conventional modules, not all of which had been thoroughlychecked at the time of the announcement of the classification. What is neededhere is a map of the proof, with clear evidence that each region is under control.It seems to me unlikely that, for either of the cases above, much would be gainedby further formalizing or automation of the proofs.

(c ) Idealized proofs

With the development of mathematical logic over the last 150 years, a newidealized notion of proof emerged, that of a fully formalized proof on the basis ofaxioms. Whatever was controversial in the so-called foundational debates of theperiod fromFrege toGodel, one could not deny, unless onewere an intuitionist witha Brouwerian epistemology, that one had an algorithm for translating the majorityof conventional mathematical statements into the semantics of one of theformalisms deriving from that debate, and that moreover one could go ontotranslate (again mechanically in principle) classical informal proofs into formalproofs of some accepted formal system (Principia, ZFC, or first-order Peano, etc.). Ipersonally doubt that these translations are faithful to the creative processes ofmathematics, and I deny that they constitute a case that set theory, in some form acommon feature of almost all the axiomatic systems involved, is thereby establishedas ‘the’ foundation for mathematics. But I do believe that over a wide range thesetranslations do map informal proofs to formal proofs, and that one can exploit this,even if one remains queasy about set theory, as foundation or otherwise.

A. Macintyre2420

Phil. Trans. R. Soc. A (2005)

That there is a long way to go from an informal proof to a formal ‘Hilbertian’proof is clear, and probably few major theorems of the last 200 years have everbeen fully formalized in this way. There is, however, a third possibility, namelythe use of systems like Isabelle, which yield completely formalized proofsmechanically, and have instructions corresponding quite closely to the informalinstructions, lemmas, definitions and other strategies of ordinary proofs. Goodaccounts are given in Constable’s (1998) and Avigad’s (2004).

Mackenzie (2005) gives an illuminating account of new uses of the term ‘proof’,in connection with the broad enterprise of computer verification. Moreover, as Ihad known from my own experience, there is considerable legal interest in issuesof identity of these modern proofs. Philosophers and proof theorists have touchedon the issue of a general theory of identity of proofs, but I personally regard thisdiscussion as premature. I make some remarks on this later.

(d ) Hilbert’s formalism

Via the above loose translation, proofs themselves became the subject matterof a part of mathematics, and one could, at least if one were Hilbert, imaginesignificant mathematical proofs about proofs and provability. It was clear thatformal proofs are finite combinatorial entities like numerals, and thus oneexpected proofs about them to involve inductions, perhaps even very simpleinductions. Only later would one realize that there are important argumentsabout proofs that require distinctly nonclassical combinatorics and inductions.On the one hand, it must have been clear from the outset that one needssufficiently strong axioms to prove certain commonplace things. For example, noone ever seriously thought that one can derive much number theory from theaxioms for commutative rings, and indeed one can prove this by the kind ofmodel theory Hilbert used in the foundations of geometry (Shepherdson 1964).Hilbert had the imagination to see that one could not exclude, on the basis ofanything known at the time, that one could prove the formal consistency ofstrong mathematical systems in very weak ones, such as primitive recursivearithmetic. Interesting as this mathematical possibility was, it has certainly beenoveremphasized, and this has distracted attention from more rewarding aspectsof proof theory. The goal of my talk is to give a simple account, aimed atnonspecialists, of the significance for working mathematicians of the proof theoryinitiated by Hilbert. The account will be very selective, and indeed will involveneither recent proof theory nor a detailed account of classical proof theory.

(e ) Incompleteness

From my present perspective, the First Godel Incompleteness theorem isalmost a peripheral result in proof theory. It is better regarded as a result in thetheory of provability or computability, rather than having anything to do withthe fine structure of formalized proofs. The results are of striking generality, andhave little to do with the individual deductive or semantic mechanisms ofparticular formal axiomatic systems, and indeed the essence of the FirstIncompleteness theorem is surely to be found in such abstract discussions ofrecursion theory as Smullyan’s (1961). Even the Second Incompleteness theoremcan be suggestively analysed axiomatically, in terms of appropriate algebraicstructures, and modal logic, as has been done in the theory of the provability

2421Mathematical significance of proof theory

Phil. Trans. R. Soc. A (2005)

predicate in such works as (Boolos 1993), and had been done in less generalitymuch earlier by Hilbert and Bernays (1934, 1939). The Second Incompletenesstheorem gets interesting twists in the case of strengthenings of ZFC, where oneshows, e.g. that one cannot prove the existence, in a system S say, of a certainlarge cardinal, because that would allow one to prove the formal consistency of Sin S. This is certainly a useful check on alleged proofs of existence, but little more.

Inevitably, people have tried to miniaturize the Godel machinery to getinsights on PZNP, but with limited success. Of course new information has beenuncovered, notably the failure of bounded arithmetic with total exponentiationto prove even consistency of the feeble system Q (Paris & Wilkie 1987). Notethat in contrast this system proves Matejasevic’s theorem, and e.g. the primenumber theorem.

(f ) Completeness

Godel’s Completeness theorem of 1930 has a different flavour. This is sensitiveto the formal system used, though the method is very general, extending to allsorts of logics, some useful for computer science. The usual method of proof is toshow that if something is not provable, then there is a structure, built from thesyntax of the language using a relation of provable equivalence, in which theformula comes out false. One is not constructing a proof, or doing anything withthe fine structure of proofs. Rather one is arguing by contradiction. Moreover, oneknows, by the general technology of Incompleteness, that there are unprovableformula whose negation has no recursive model. It is noteworthy that twodistinguished mathematicians, Herbrand and Skolem, were somehow blockedfrom (the simple proof of) the Completeness theorem by ideological constraintsaround the notion of truth. In this connection, one should read Cohen (2005),where he gives a related, but ultimately different, perspective on Skolem’s role.

(g ) How Incompleteness evolved

Developments subsequent to Godel, refining the form of definition of the basicnotions of recursion theory, and culminating in Matejasevic’s theorem (Daviset al. 1976), show that the incompleteness phenomena go right down to sentencesexpressing the unsolvability of diophantine equations. This is at least startling,to see how set theoretic principles affect what one can prove about diophantineequations. Note that the diophantine incompleteness known now involves high-dimensional varieties, but it cannot be excluded, on the basis of presentknowledge, that it goes right down to curves. This is a deplorable possibility(Shafarevic’s gloomy joke) as it would have negative implications for majorconjectures in arithmetic.

One of the most beautiful descendants of the First Incompleteness theorem isHigman’s (1961), characterizing the finitely generated subgroups of finitelypresented groups. This result is surely significant for group theory, but dependson methods that originated in proof theory.

(h ) Unprovability theory

The work of Godel and Cohen on the Continuum Hypothesis (CH) and theAxiom of Choice (AC) is not really proof theory, though it is certainly of the first

A. Macintyre2422

Phil. Trans. R. Soc. A (2005)

importance for provability in systems of set theory. Cohen’s work is of evengreater importance, as having provided a flexible general method forconstructing models of set theory. It has revived a flagging subject, and led to40 years of spectacular activity (in combination with other methods). In thecases of both Cohen and Godel, one got much more than mere independence fromtheir proofs.

I cannot resist mentioning a couple of cases where these results are relevant tomathematics apparently remote from set theory. Serre had in the 1950s, after hisfamous work on homotopy groups (Serre 1953), raised the issue as to whether onemight really need the axiom of choice to prove the kind of statements he hadproved. Serre had noted the use of Choice in connection with a homology/cohomology ‘duality’. Kreisel clarified the matter to Serre’s satisfaction, thus.Over some weak system, the statements are formally equivalent to arithmeticalstatements, where the quantifiers range over only integers (finite ordinals,elements of u). Serre’s proof was, by inspection, in ZFC. Now, if there were amodel M of ZF in which the statement failed, it would also fail in the model ofconstructible elements of M, because of its syntactic form, and so it would fail ina model of the axiom of choice. But Serre had a proof that it held in any model ofZFC. So it holds in all models of ZF, so there is a proof from these axioms, byGodel. Note that there is a recipe for translating Serre’s proof into one not usingZFC, but Serre probably had no need to see this proof.

An important point here is the need to pay attention to the syntactic form ofthe assertion. This is always going to be relevant for interesting results.

A related example would use CH instead, showing that it cannot have anysignificance as far as provability of arithmetical results in ZFC is concerned.Examples of the use of this (worth knowing, though the method is not currentlyin use) are in Ax & Kochen (1965).

I do not know if anyone has ever worried about using the negation of AC orCH in number theory, but from the details of Cohen’s method one sees that theseare also eliminable, because Cohen’s method does not extend the integers ofmodels of set theory.

My second example also concerns the axiom of choice, this time its role inDeligne’s monumental (Deligne 1980). At some point Deligne would get by moreeasily if he had an embedding of the p-adic numbers in the complexes (somethingwhich clearly exists by the axiom of choice). Deligne declares that AC isrepugnant to him, and explicitly notes that all he needs in his subsequent proof isthe embedding of any finitely generated subfield of the p-adics in the complexes.The proof of this involves just the basic combinatorics of elementary field theory,and certainly does not use AC. One should note, too, that the construction of thep-adics does not use the axiom of choice). On the other hand Deligne somehowhas to survey his extremely complex proof, and convince himself that this is allhe need. The latter is not in doubt, but tedious (even in principle) to demonstrateformally.

Now, Kreisel’s argument would have sufficed for Deligne’s purposes, providedtwo things:

1. The conclusion he wanted is arithmetical;2. His proof is in ZFC.

2423Mathematical significance of proof theory

Phil. Trans. R. Soc. A (2005)

Both aspects are delicate. Deligne’s most general conclusion is aboutconstructible sheaves, and is perhaps not arithmetical. But his applications tononprojective varieties are certainly arithmetical, so that at least for theseKreisel’s argument would apply provided the whole proof was done in ZFC. It isprobable that in all cases Deligne’s argument can be adapted, by working in asuitable L[A]. Such an exercise seems worthwhile.

That his proof is in ZFC seems clear to me, but remarks in (Browder 1976)attributed to Manin suggest that some experts have not been quite sure aboutthis.

Deligne is, of course, right in suspecting that some trace of AC is needed toembed the p-adics in the complexes. Cherlin and I checked (unpublished) that inSolovay’s famous model (Solovay 1970) there is no such embedding, by showingthat the embedding would have to be continuous.

(i ) Provability theory

Another aspect of provability theory not often given the same prominence asthe Godelian phenomenon is that there are many cases now where one has acomplete, intelligible (and sometimes recursive) set of axioms for a naturalmathematical domain, usually geometrical. Examples are algebraically closedfields, real closed fields, p-adic fields, and various fields with analytic functions.Easy arguments use this fact to give nontrivial uniformities and/or recursivebounds sometimes not evident to the original provers of the theorems. Moreover,the bounds and algorithms coming from this area are generally relevant forcomputer science, though perhaps not for automated deduction.

A recurring issue at our meeting was Hales’s proof of Kepler’s Conjecture. It isimportant to stress that the point at which Hales needs computer assistance is fora proof of a statement in high-dimensional semi-algebraic geometry, one of themost important subjects where one has a Completeness theorem. In principle onecan effectively find a proof of Hales’s semi-algebraic result, if it is true, frompurely algebraic axioms. Unfortunately, one knows that in the general case thewaiting time for a proof is unrealistically long. However, logicians and computerscientists have uncovered much information on this topic, and it may beworthwhile to see if any of the accumulated insights are useful in cross-checkingthe Hales result or in shortening its proof.

An essential point in the above, and in most of what follows, is theunsystematic nature of the applications. One has to know a fair bit of the subjectmatter, and understand the proofs therein, before one can apply logical principlesto get something interesting. In particular, most of the applications are donewithout recourse to any axiomatic system with general objectives.

We turn now to genuine proof theory, and not just provability theory.

2. Proof theory

(a ) The fundamental theorems

Modern proof theory begins with the e-theorem of Hilbert, and becomesapplicable in the work of Herbrand (1930) and Gentzen (1969). This work is ofpermanent importance (it seems to me) in the face of the Godelian phenomenon.

A. Macintyre2424

Phil. Trans. R. Soc. A (2005)

The importance for computer science can hardly be disputed. I am concernedhere rather with the relevance to existing mathematical proofs.

Herbrand’s work was not readily assimilated, but it is now at the base of theproof theory of computer science (where more general notions of proof orverification are en vogue). The essential feature of his method is that provabilityfrom universal axioms of an cd sentence implies that a finite disjunction ofquantifier-free instantiations is proved. This is almost trivial model theoreticallyin first-order logic, but the formal version can be extended to general shapes ofaxioms, provided one passes to suitable extensions by function symbols, and hasproved both powerful and suggestive. A typical case is the following (from page120 of Girard (1987)):

2.1 Let A be a formula in prenex form, for instance

AZdxcydzc tR½x; y; z; t�;

with R quantifier-free. let f and g be two new function letters with f unary and gbinary. The A is provable in predicate calculus iff there are terms U1,.,Un, W1,.,Wn (using the letters f and g) such that

R½U1; f ðU1Þ; gðU1;W1Þ�n.nR½Un; f ðUnÞ;Wn; gðUn;WnÞ

is a propositional tautology.Model-theoretically, with the axiom of choice assumed, this is utterly trivial,

and easily generalized. But the theorem has nothing to do with the axiom ofchoice. It is purely combinatorial and tells one that one cannot have a predicatecalculus proof without having provability of a formal Herbrand disjunction. It isnoteworthy that in applications (see for example Luckhardt’s work discussedlater) the mere knowledge that some disjunction should be looked for has beenhelpful in obtaining hidden bounds in proofs. Of course, the combinatorics of theHerbrand terms is in general beyond control, and indeed the proof theory of suchterms is still not at a point where applications of this aspect have been obtained.The work on unification, arising from resolution, may be regarded as a specialcase, but little positive has been obtained.

Gentzen’s work on cut-elimination and the subformula property, has beenpushed much further by both mathematical logicians and computer scientists.The work of both, and their followers, has allowed applications of proof theory tomathematics. This is not the place to go into an account of the sequent calculus,and an explicit statement of what is cut-elimination. It is surely better to quoteGirard (1987), p. 95:

Roughly speaking, a cut-free proof is a proof from which all formulas whichare ‘too general’ have been banished. General formulas in a proof carry theideas of the proof: it is only because we have general formulas that we canhave short and intelligible proofs; when, in the Hauptsatz, we eliminate allthese general formulas, we increase the length and obscurity of the proof: forthat reason, cut-free proofs are unlikely objects for mathematical practice.Their interest lies somewhere else: these proofs are very interesting tostudy, because their general properties are very important.

2425Mathematical significance of proof theory

Phil. Trans. R. Soc. A (2005)

Of course, this is not precise, but it is suggestive. For the formal details of theHauptsatz, and all sorts of interesting asides, one can consult Girard’s frequentlyintemperate (Girard 1987).

For an account of the connections between the two methods, see Girard(1987), p. 122.

Much nonsense has been pronounced about Gentzen’s work, even byextremely distinguished people. Consistency is not really the main issue at all.He did reveal fine structure in the unprovability of consistency of PA, as aconsequence of much deeper general methodology. It is not a question of provingthe consistency of induction on u by something evidently stronger. The realpoint is that the ordinal e0 has a very simple primitive recursive representation,and yet we cannot prove in PA the principle of induction for a specific quantifier-free predicate on e0. This principle is almost as clear intuitively as that forinduction on u for simple predicates. Gentzen showed, by what remain deepmathematical ideas, that this principle proves the consistency of PA, and,moreover, opened the way for later people to observe that one can give asatisfying answer to the question of the nature of the provably total functions ofPA (in the technical sense of Bucholz & Wainer (1987)). They have to berecursive, but obtainable by a scheme of recursion on an ordinal less than e0. Thisled to an intelligible form for such functions (Kreisel–Wainer hierarchy), andlater refinements to subsystems of PA (and a research programme related to PZNP). Kreisel spotted the phenomenon, and made the basic observations, whichhad a strong influence on major work such as that of Ketonen and Solovay. Manylater workers identified similar behaviour for other systems (including higher-order systems, e.g. Godel’s T).

In summary, Gentzen laid bare the fine structure of provability in predicatecalculus, and then in some very specific and important systems (such as PA),going much deeper than Godel. But of course one had to start afresh in seekingsimilar analyses for stronger (or, much later, weaker systems) whereas one hadthe Godel analysis for all natural systems. It is to be stressed that much of thelater heroic work of this nature has not given any spin-off in terms of hiddeninformation.

This development provoked Kreisel’s memorable question:What more do we know when we have proved (or have a proof of) a sentence

A in system S, rather than mere truth (if the system is a conventionallymotivated one)?

A typical answer is that depending on the syntactic form of the sentence, wemay get rate of growth information on witnesses for its existential quantifiers.This is certainly potentially valuable.

(b ) Complex methods in arithmetic

Godel proved that passage to second-order arithmetic (in the axiomaticsense) increases our stock of provable first-order formulas (e.g. consistencystatements). But this alone left open the possibility that proofs of arithmeticalstatements via complex analysis (doable in second-order arithmetic) could bereplaced by first-order proofs. An example, of doubtful significance, is theelementary proof of the prime number theorem. I do not know what newinformation one gets from that over a classical complex analysis proof.

A. Macintyre2426

Phil. Trans. R. Soc. A (2005)

Moreover, the elementary proof generalizes, to Dirichlet situations, withdifficulty, and beyond that not at all. So it is a concrete question as towhether there is a more subtle Godel phenomenon in analytic number theory.One has to formulate the question sensibly, of course. The model theory ofsine on the reals codes second-order arithmetic, and so suitably interpretedproves consistency of PA or indeed second-order arithmetic. This is not whatis intended! Rather, is there something in all the interesting, mainstreamproofs, using complex analysis, which allows one to reproduce them (no doubtwith loss of beauty and intelligibility) in first-order arithmetic (say PA)? Thefact is that there is. The first examples are due to Kreisel in the early 1950s(Kreisel 1951, 1952). The lack of detail in his discussions has been noted byFeferman in a very useful article (Feferman 1996), but to me the matter hasalways been clear enough. I have not seen the need to bring in general purposeformal systems and prove the conservative nature of Konig’s Lemma, thoughthere is a point to this. It is equally obvious that other more ad hoc methodswork.

As discussed by both Luckhardt and Feferman in ‘Kreiseliana’ (Odifreddi1996), Kreisel sketched an argument for the effective content of Littlewood’sfamous result showing that the difference between pi(x) and li(x) changes signinfinitely often. Here there are some little formal-logical tricks, which, when usedby one familiar with Littlewood’s proof, provide bounds (not very sharp, ofcourse, but this was never the issue). It seems to me that this proof of Littlewoodis readily converted to one in bounded arithmetic plus total exponentiation, andthis alone would yield iterated exponential bounds, with the length of the towercoming from a careful analysis of the inductions used in the proof.

(c ) Isabelle

We have recently been told that the prime number theorem has been donein the system Isabelle. I guessed, on hearing this, correctly, that it must be theelementary proof, i.e. the least suggestive one. It turns out that Isabelle doesnot have a complex number library. But then, is it doing number theory,nearly 200 years since complex analysis began to show in number theory? I donot wish to quibble here. But there is an important point. The proofformalized is one which is certainly doable in bounded arithmetic plusexponentiation, and one gets the impression that Isabelle can code this sort ofthing almost routinely. But it does not deal naturally with geometric ortopological arguments, and thereby is well out of step with modern numbertheory.

It is regrettable that one did not really have time at the meeting to get anextended statement of the goals of this kind of automatic theorem proving.Barendregt introduced the basics (and referred to the prime number theorem)but we must wait for a subsequent meeting to get a clear sight on the goals of thisproof enterprise. My impression is that those making the most seriouscontributions to the enterprise of this kind of automatic theorem proving arenot making any grand claims, and readily acknowledge that one is rather a longway from having a ‘library’ useful either for basic education in complex analysisor for more advanced algorithmic purposes.

2427Mathematical significance of proof theory

Phil. Trans. R. Soc. A (2005)

(d ) From mere truth to effective bounds

Sometimes one has a problem posed by high-level practitioners, such as Weil’son whether one could obtain bounds in Hasse’s Norm theorem. I came to thisquestion by noting, to my surprise, that Serre (1997), settled for a generalrecursive argument to decide when something is a norm. I worked out primitiverecursive bounds by going through (partially with Kreisel) an analysis of classfield theory, with its appeal to both analysis and cohomology. Both caused melots of pain, particularly the use of Herbrand quotients in cohomology, wherecrucial information gets cut away. (It is notable that the function field case ofclass field theory is much more explicit than the numberfield case.) When I hadfinished, it was pointed out to me that Siegel, in a late paper, had given muchbetter bounds, via geometry of numbers, just using the fact that the theorem wastrue, but without appealing to any particular proof. Thus there is a lesson here.Sometimes one does not need to do deconstruction on a proof to get constructiveinformation about what has been proved, or is true.

(e ) Unwinding

Kreisel has been the main contributor to ‘unwinding’. This is the activity oftaking an informal mathematical proof, giving an analysis of where it is proved(i.e. in which formal axiomatic system), and doing some thought experiments onthat proof to give sharper information on what is being proved (usually morethan the original prover thought). The thought experiments involve variedmeans, which Kreisel compared to the means of applied mathematics. It is noteasy, and perhaps not even worthwhile, to formalize/systematize them. It is, onthe other hand, at least healthy for mathematics if some people know thesetechniques very well, and look out for areas in which to apply them.

(f ) Texts on unwinding

There are a number of interesting texts about this approach, not uniformlyappreciative. There are the papers of Feferman and Luckhardt in ‘Kreiseliana’,and the recent lectures of Avigad (2004) at ASL 2004 on proof mining. Thesecover the majority of the applications. It is fair to say that there are not manyapplications after 50 years. But then, there are few applications of topos theory,or model theory of sheaves, to geometry. That is no reason to abandon theteaching of such material. In both cases, applications will come only when thematerial is familiar to people who know also the intended area of application.Without this combination of expertise, nothing is to be expected.

The commentators on unwinding list all of Kreisel’s efforts and the recentpapers refer also to the work of Kohlenbach. Neither refers to the paper byKreisel & Macintyre (1982), concerning the conditional proofs giving effectiveestimates in finiteness theorems in number theory from the assumption ofeffective estimates in Roth’s theorem. Even granted that the promised sequel tothat paper has not been written, this seems a bit strange. Contrary to the title,we did not advocate the informal method of algebraization as a replacement forunwinding methods more closely linked to proof theory. We did, however, wishto point out, for the particular situation of Siegel’s theorem, there are seriouslimitations of the methods based only on proof theory, and that one does better

A. Macintyre2428

Phil. Trans. R. Soc. A (2005)

by an ad hoc treatment based partially on algebraizing (which may involveaxiomatizing) the complicated mix of methods that go into the proofs of theFiniteness theorems. The treatment in Serre’s (1997) was congenial to us, and wefitted our analysis to it.

From the perspective of this paper, the mathematical significance of prooftheory is that it provides methods, which can, if used with discretion,reorganize informal proofs so that they yield more information. Thus, Idisregard the other side of the matter that there are some beautiful anddifficult theorems in proof theory, serious mathematics with no applications inmind. And, above all, I pass over in silence any thought that proof theory hassignificance for foundations of mathematics. That mathematics can beformalized, and that much of it can be axiomatized, is a basic discovery,essential to know, but not deep knowledge.

3. Unwinding

(a ) Examples

Kreisel has brought unwinding to bear on:

1. analytic number theory;2. number of solutions to diophantine problems;3. bounding solutions to diophantine problems;4. bounds in Hilbert’s 17th Problem;5. bounds in polynomial ideals.

Girard has brought proof theory to bear on proofs of van der Waerden’stheorem using dynamical systems (and this might be relevant even for primes inarithmetic progression, etc.).

Luckhardt has applied Kreiselian tricks, involving the Herbrand formalism, tothe issue of number of solutions in Roth’ theorem.

Kohlenbach has used serious proof theory on problems in modernapproximation theory.

There is no time here (especially in the context of a Discussion Meeting)to go through details of each case (Feferman does run through the differencesbetween cases) though I will say a little. Avigad’s lectures and Kohlenbach’spapers (Kohlenbach 2001; Kohlenbach & Olivia 2003a,b) provide the mostsystematic account I know. I personally am partial to the fulminatingaccount given by Girard. I stress only the diversity of the problems, and thecommon feature that one is not dealing with a fully formalized proof here,but an informal conception of a full formalization of such a proof, to whichone applies the general technology of the proof theory of Herbrand, Gentzen,Godel, Kreisel and others.

As said before, the difference from other points of view here is that oneconcentrates on seeing what more logic can reveal from the mere existence of aproof in a particular system. There is nothing in the method that casts light onthe use of computational auxiliaries.

2429Mathematical significance of proof theory

Phil. Trans. R. Soc. A (2005)

(b ) Fully formalized proofs

Avigad has an interesting article which contains a reasoned account of Isabelleand the like. There is no question of these systems existing to find newmathematical theorems. Rather they are designed to provide intelligibleintermediaries, proofs of existing results presented not in the impenetrablearmour of Hilbertian proof (as if they could be!) but rather in some naturalevolving formalism that corresponds to low-level mathematical communication(but using higher-level instructions!!). This still leaves the question as to whythey communicate the obscure proof and not the clear one.

(c ) Girard unwinding

What I want to communicate in this meeting on the nature of proof is merelythat one can apply the technical tools of mathematical logic to extract hiddenand valuable information from complex proofs. These proofs need not be fullyformalized, but a sine qua non of the method is an ability to understand in what,preferably weak, system the informal proof has a formal counterpart. For thisyou need to understand the proof. For example, if you use compactness orcompleteness, you should know something about the logical complexity of thepredicates to which you apply these principles. Here you have to unwind theproof, or perhaps modify slightly, and then unwind. In effect, you have to beaware of a small number of systems in which the bulk of current mathematics canbe carried out (ZFC is far too strong), and then you need to know some specificsof their proof theory. With this repertoire, you are in a position to extract usefulinformation from informal proofs. However, it does not seem crucial right now tobe expert on ordinal aspects of the theories, though one can well imagine thatabstract invariants of proofs can at some point be valuable.

It is a somewhat startling fact that one can, as Girard does, use cut-elimination on a portion of an informal proof. This is possible, because he has aclear view of a formal system in which that part of the proof can be formalized,and he understands perfectly the algebra of cut-elimination, so can apply it ‘in ahigher order way’. Essentially he restructures the proof to a formal induction inarithmetic.

What does Girard do? He begins with the memorable proof, by Furstenberg andWeiss, using dynamical systems, of the van der Waerden theorem on arithmeticprogressions. From the mere truth of that theorem one gets the existence ofa function W(p, k) such that given any partition of the set 0,.,W(p, k)K1 into kclasses then one of the classes contains an arithmetic progression of length p.Of course, there are then many such functions, and the question is whetherW canbe chosen to have better properties, e.g. in terms of computability or rate ofgrowth.

That W can be chosen recursive is general nonsense and essentially useless.That W can be chosen to have growth rate around that of the Ackermannfunction has been known for a long time, and can be read off from any elementaryproof. After the unwinding efforts I will describe below, Shelah (1988) showedthat W can be chosen primitive recursive, and indeed low in the Kreisel–Wainerhierarchy. Later still, the work of Gowers, using analytic methods, got evenbetter bounds. The bounds by the last two authors are far better than thoseunwound by Girard, but this is not the point. It is not obvious how to extract

A. Macintyre2430

Phil. Trans. R. Soc. A (2005)

bounds from Furstenberg–Weiss, and Girard shows how proof-theoretic tricks ofthe trade, in the hands of a master, enable one to get bounds.

There seems to me no point, in a brief discussion paper, embarking on anoutline of Girard’s proof, especially as he gives a lively, explicit account in Girard(1987). The essential point is to break down the proofs of the high-level dynamicsproofs, done for general spaces X, into a series of proofs in arithmetic, which takeaccount of the specific X needed at each stage of the conventional inductive proof(powers of the space used in the deduction of van der Waerden occur in thisunwinding). He made his task easier by a series of ‘astuces’, making smallobservations that would gain Fursstenberg–Weiss nothing, but are crucial forkeeping down the complexity of unwinding. Thus, he makes uses of certainsymmetry arguments, and the fact that his space X is ultrametric, to avoidcomplicating the induction, and thereby the unwinding. After this preparation,Girard has a memorable geometrical picture. The process introduces other cuts,but he has managed things so they will have no dynamical significance. After theelimination of the dynamical systems components he gets a bound of Ackermanntype for the van der Waerden theorem. What he has unwound is really a minor(from the dynamics viewpoint) variant of Furstenberg–Weiss. I again stress thatit is not the quality of the bounds that matter, but the fact that a skilled prooftheorist, using in an imaginative way classic theorems of the subject, can getbounds that eluded the first generation of mathematicians who gavenoncombinatorial proofs of van der Waerden. It is no surprise that methodsmore closely tied to the subject matter will eventually do better than a powerfulgeneral method.

Girard, later in the book, does another, quite different unwinding, using the so-called no-counterexample interpretation. This method was first popularized byKreisel, though it has its origins in work of Godel from the 1930s (It can bederived by either the method of Herbrand or the method of Gentzen.). This timehe analyses directly the Furstenberg–Weiss proof via minimal dynamicalsystems. There is still a cut-elimination that can be done, but not on the proofof existence of minimal systems. On that he uses a sequence of no-counter-example interpretations, thus opening the way to bounds. These turn out to be ata level above that of the Ackermann function! This confirms a moral of Kreiselthat small differences in a proof can make an enormous difference to theinformation one can extract. This is not, of course, catastrophe theory for proofs!!

Girard’s accounts of his two unwindings are illuminating and very explicit, buthis method is perhaps a bit special. The general shape of applicable result fromproof theory is that if something is proved then in truth it has a very specialSkolemization or Herbrandization maybe involving higher-order functionals.This may or may not have to do with a cut-elimination. Moreover, one can oftenmake a proof in a prima facie strong system work in a much weaker system, sothat for example one gets iterated exponential bounds by doing a proof inbounded arithmetic plus exp (in particular, it seems that all of Hardy & Wright(1979) can be codified there, including the elementary proof of the prime numbertheorem—what is different about the Isabelle proof?). In still other cases, one ison the non-Godelian side of the fence, and one can use fast elimination, etc., toget bounds systematically—of course in general inferior to those got by specificmethods like cohomology.

2431Mathematical significance of proof theory

Phil. Trans. R. Soc. A (2005)

The vigilant reader may have noticed that I have mentioned only briefly thefunctional interpretations deriving from Godel. They are somewhat unwieldy inpractice because of the iterated implications (though, as we see below, there arecontexts where they work very well). I leave it to the reader to ponder theremarks of Girard (1987), concerning the place of this interpretation in Godel’sopus.

(d ) Identity of proofs

Though it may well be worthwhile (even for lawyers) the attempt to formalizenotions of identity of proofs are not discussed here. When an important theoremhas many proofs (e.g. quadratic reciprocity) mathematicians will want tocompare the merits of these proofs. It is unlikely to be worthwhile to consider all153 proofs of quadratic reciprocity, but most number theorists would agree thatquadratic reciprocity is a part of a greater theory, class field theory, and that inturn part of a greater (Langlands theory) and thus one is mainly interested inproofs that generalize. This is one of the defects of the elementary proof of theprime number theorem, that it does not generalize widely, and moreover itsuppresses sight of e.g. zeta functions. It can very well be that a special proof hasother virtues, e.g. for computational complexity. Recall the example ofSylevster’s proof of infinitude of primes (Woods 1981). It does not appeal toentities of exponential growth, almost uniquely among such proofs (at least in thesense of appeal that logicians use).

The main point here is that we do not at the moment have any clear sense ofthe extra information concealed in different proofs of the same theorem,especially if these proofs are formalized in the same Hilbertian system. It is goodto draw attention to these things, as Kreisel has often done, but it is certainlypremature to attempt to formulate any philosophy of such matters. It is alreadyimportant to note, as Kreisel does, and Girard does, that small differences inproof can make a great difference to what can be extracted.

(e ) Kohlenbach’s work

Kohlenbach’s papers provide a detailed account of the technicalities ofunwinding, first for Cebychev’s theorem, and subsequently for more complexproblems. But as always the idea is to get some formal simplification proved, andthen be able to bound something usefully. This would have to happen in theLittlewood case for Kreisel to have had real success there. One does not expectgeneral considerations to provide such bounds, and real work will always beneeded. What is significant is that the proof-theoretic manipulations give a realadvantage (definitely for Kohlenbach).

The functional interpretations are descended from Herbrand’s theorem. Thishas an almost trivial model-theoretic proof, in a useful special case. But thegeneral case is typically daunting to all but proof theorists. Moreover, the Godelfunctional interpretation is for intuitionistic systems, again not exactly attractiveto classical mathematicians.

Herbrand’s theorem can be extended to higher complexity formulas by thedevices of Skolemization and Herbrandization, though this is not how Herbranddid it. There is a useful account in Buss’s (1998) survey, giving the link to staplesof computer science, such as resolution and unification. But the essential point is

A. Macintyre2432

Phil. Trans. R. Soc. A (2005)

that if something is provable in certain systems, it has a proof which gives morein the way of witnessing existential quantifiers, and is thus natural/explicit. Putdifferently, it reveals a nontrivial uniformity.

Kreisel has repeatedly pointed out that classical proof theory has not achievedany serious combinatorial analysis of the Herbrand terms, and that one isunlikely to go much deeper in unwinding unless one has some perhaps quiteelementary methods for reasoning about this kind of thing.

The Godel functional interpretation translates intuitionistic higher-orderarithmetic into a quantifier-free axiom system for primitive recursive functionalsof higher type. The recursion is purely formal. In particular it provides yetanother consistency proof for PA, and one may wonder what the point/need ofthat is/was. The point emphasized here, and derived from Kreisel, is that it is atleast equally rewarding to see what this translation gives as a tool for extractingbounds from proofs.

On the one hand, it subsumes the no-counterexample interpretation as used byGirard. On the other, it is the method that Kohlenbach uses in his unwinding ofmuch of classical approximation theory. A perhaps temporary advantage ofKohlenbach’s work is that the bounds obtained are better than any othersknown.

Finally, there is the Herbrand technique used by Kreisel and later Luckhardt(1996) to bound the number of solutions in diophantine problems in cases whenone does not have any bound on the location of the solutions. In fact suchsituations in diophantine geometry seem to be the norm in practice. In Cohen’stalk he described such situations as providing most of the very few genuinelynonconstructive proofs in mathematics. Typical examples are Siegel’s theorem orFaltings theorem, where effective estimates are known for the number of zeros,and where logic proofs can usually get this too. In the unwindings of this kind, sofar, one does not need to pay any attention to the formal system in which theresult was proved. Rather one looks for a Herbrand form that is being proved. Inpractice one finds one, and one knows what one needs on the growth rate. Then,even for the classical proof Luckhardt beats the Davenport–Roth bound, and forthe Esnault–Viehweg proof Luckhardt was able to get by logical devices the samebound as Bombieri–van der Poorten. That he did not do better is not the point.But it is an important part of the description of the particular tricks of the tradeused here that no real proof theory is used, only the guiding principle that onewill be well prepared to get a bound for the number if one gets a Herbrand formwith sufficiently good growth rate. Naturally one hopes to combine this withhigher technology from proof theory to do better, but no hint of how to proceedthus has been found.

4. Closing remarks

My main impression of the meeting is that the mathematicians and computerscientists did not really get close to understanding each other (despite manyilluminating exchanges). The problems go well beyond those concerningstrategies in theorem proving by mathematicians and the strategies of theartificial intelligence community. I, as a mathematical logician, operate closer tothe borders than most participants at the meeting, and I have been left with a

2433Mathematical significance of proof theory

Phil. Trans. R. Soc. A (2005)

sense of having missed something basic. I chose to talk on the use of precisetheorems from Hilbertian proof theory formal proof to extract hiddeninformation from informal mathematical proofs. I have tried to reflect on theenterprise, explicitly championed by Hales on his website, of producing fullyformalized proofs of important results in geometry. Hales is quite explicit that hesees this as the only way of convincing the mathematical community of thecorrectness of his entire proof of the Kepler Conjecture. One thing that worriesme is that we seem to have no theory underlying this enterprise, and this it isdifficult to relate it to other formal activities in proof theory. Moreover, I ratherdoubt that complete formalization will satisfy many mathematicians.

There are some references I wish to add for those who would like to look a bitfurther in the directions set in this paper. One is a rich discussion by Kreisel(1990) on logical aspects of computation. The others are from Hales and theenterprise of full formalization. They state clear enough goals, but leave me witha sense of having missed something. They are the statements about Flyspeck,and that about the QED Project.1

I cannot imagine any useful sequel to our meeting in which the abovediscussions are not pursued.

References

Aschbacher, M. 2005 Highly complex proofs and implications of such proofs. Phil. Trans. R. Soc. A363. (doi:10.1098/rsta.2005.1655.)

Avigad, J. Proof mining. Notes from ASL 2004, at www.andrew.cmu.edu/avigad.Ax, J. & Kochen, S. 1965 Diophantine problems over local fields. II. A complete set of axioms for p-

adic number theory. Am. J. Math. 87, 631–648.Boolos, G. 1993 The logic of provability. Cambridge: Cambridge University Press.Browder, F. E. 1976 Mathematical developments arising from Hilbert problems. In Proceedings of

Symposia in Pure Mathematics, vol. XXVIII. Providence, RI: American Mathematical Society.Bucholz, W. & Wainer, S. 1987 Provably computable functions and the fast growing hierarchy Logic

and combinatorics, contemporary mathematics, vol. 65. Providence, RI: American Mathemat-ical Society.

Buss, S. 1998 An introduction to proof theory. In Handbook of proof theory (ed. S. Buss), pp. 1–78.North-Holland: Elsevier.

Cohen, P. J. 2005 Skolem and pessimism about proof in mathematics. Phil. Trans. R. Soc. A. 363.(doi:10.1098/rsta.2005.1656.)

Constable, R. 1998 Types in logic, mathematics and programming. In Handbook of proof theory(ed. S. Buss), pp. 683–786. North-Holland: Elsevier.

Davis, M., Matejasevic, Y. & Robinson, J. 1976 Hilbert’s 10th problem: positive aspects of anegative solution. In Proceedings of Symposia in pure mathematics, pp. 323–378. Providence,RI: American Mathematical Society.

Deligne, P. 1980 La conjecture de Weil. II. Inst. Hautes Etudes Sci. Publ. Math. No. 52, 137–252.Feferman, S. 1996 Kreisel’s “Unwinding” program. In Kreiseliana, pp. 247–274. Wellesley, MA:

A.K. Peters.Gentzen, G. 1969 Investigations into logical deduction. In The collected works of Gerhardt Gentzen

(ed. M. E. Szabo), pp. 68–131. Amsterdam: North Holland.Girard, J.-Y. 1987 Proof theory and logical complexity. In Studies in Proof Theory, Monographs, 1,

505 pp. Naples: Bibliopolis.Hardy, G. & Wright, E. 1979 The theory of numbers. London: Oxford University Press.1 See http://www.math.pitt.edu/wthales/flyspeck/.

A. Macintyre2434

Phil. Trans. R. Soc. A (2005)

Herbrand, J. 1967 Investigations in proof theory (1930). In Translation in “From Frege to Godel”(ed. J. van Heijenoort), pp. 529–581. Cambridge, MA: Harvard University Press.

Higman, G. 1961 Subgroups of finitely presented groups. Proc. R. Soc. A 262, 455–474.Hilbert, D. & Bernays, P. 1934 Grundlagen der Mathematik, vol. 1. Berlin: Springer.Hilbert, D. & Bernays, P. 1939 Grundlagen der Mathematik, vol. 2. Berlin: Springer.Kohlenbach, U. 2001 On the computational content of the Krasnoselski and Ishikawa fixed

point theorems. In Proc. Fourth Workshop on Computability and Complexity in Analysis (ed.J. Blanck, V. Brattka, P. Hertling & K. Weihrauch), pp. 119–145. Springer LNCS 2064.

Kohlenbach, U. & Olivia, P. 2003a Proof mining: a systematic way of analysing proofs inmathematics. Proc. Steklov Inst. Math. 242, 136–164.

Kohlenbach, U. & Olivia, P. 2003b Proof mining in L1-approximation. Ann. Pure Appl. Logic 121,1–38. (doi:10.1016/S0168-0072(02)00081-7.)

Kreisel, G. 1951 On the interpretation of non-finitist proofs. I. J. Symbolic Logic 16, 241–267.Kreisel, G. 1952 On the interpretation of non-finitist proofs. II. Interpretation of number theory.

Applications. J. Symbolic Logic 17, 43–58.Kreisel, G. 1990 Logical aspects of computation. In Logic and computer science (ed. P. Odifredddi),

pp. 205–278. San Diego: Academic Press.Kreisel, G. & Macintyre, A. 1982 Constructive logic versus algebraization. I. The L. E. J. Brouwer

Centenary Symposium (Noordwijkerhout, 1981). Stud. Logic Found. Math. 110, 217–260.Luckhardt, H. 1996 Bounds extracted by Kreisel from ineffective proofs. In Kreiseliana,

pp. 275–288. Wellesley, MA: A. K. Peters.Mackenzie, D. 2005 Computers and the cultures of proving. Phil. Trans. R. Soc. A 363. (doi:10.

1098/rsta.2005.1649.)Odifreddi, P. (ed.) 1996. Kreiseliana. Wellesley, MA: A. K. Peters.Paris, J. & Wilkie, A. 1987 On the scheme of induction for bounded arithmetical formulas. Ann.

Pure Appl. Logic 35, 261–302. (doi:10.1016/0168-0072(87)90066-2.)Serre, J. 1953 Quelques calculs de groupes d’homotopie. C.R. Acad. Sci. Paris 236, 2475–2477.Serre, J.-P. 1997 Lectures on the Mordell–Weil Theorem, Vieweg.Shelah, S. 1988 Primitive recursive bounds for van der Waerden numbers. J. Am. Math. Soc. 1,

683–697.Shepherdson, J. 1964 A nonstandard model for a free variable fragment of number theory. Bull.

Acad. Pol. Sci. 12, 79–86.Smullyan, R. 1961 The theory of formal systems Annals of mathematics studies, vol. 47. Princeton,

NJ: Princeton University Press.Solovay, R. 1970 A model of set theory in which all sets are Lebesgue measurable. Ann. Math. 92,

1–56.Woods, A. 1981 Ph.D. thesis, Manchester.

2435Mathematical significance of proof theory

Phil. Trans. R. Soc. A (2005)

The justification of mathematical statements

BY PETER SWINNERTON-DYER

University of Cambridge, Wilberforce Road, Cambridge CB3 OWB, UK([email protected])

The uncompromising ethos of pure mathematics in the early post-war period was thatany theorem should be provided with a proof which the reader could and should check.Two things have made this no longer realistic: (i) the appearance of increasingly long andcomplicated proofs and (ii) the involvement of computers. This paper discusses whatcompromises the mathematical community needs to make as a result.

Keywords: proof; justification; programs; conjectures

I approached this conference with a seriously split personality. I am usuallyregarded as a number theorist, and therefore, as a pure mathematician of themost uncompromising kind. On the other hand, I also work at the more vulgarend of the study of ordinary differential equations; indeed for years I was the onlypure mathematician in Cambridge who had a visa to enter the Department ofApplied Mathematics. And for a substantial part of my career I was employednot as a mathematician but as a computer scientist. In these three roles, myattitudes to what should be regarded as a proof have been quite different.

In the real world, what is regarded as an adequate proof depends very much onwhat one is trying to prove. It takes far less evidence to convict a person ofspeeding than to convict him (or her) of murder—and nowadays it appears thateven less evidence is needed to justify waging war. In mathematics we need toaccept (and indeed have tacitly accepted) the same diversity. We have an idealconcept of what is meant by a rigorous proof, but in many contexts we cannotafford to live up to that standard; and even before the days of computers,mathematicians had devised various ways of loosening that straightjacket.Moreover, the amount of effort which the mathematical community puts intochecking a purported proof depends very much on the importance, theunexpectedness and the beauty of the result.

The most demanding standard that I have ever encountered was thatimpressed on me by J. E. Littlewood, who was my research supervisor. Hemaintained that because so many published papers contained errors or at leastgaps, one should never make use of someone else’s theorem unless one hadchecked the proof oneself. He was of course conditioned by having lived throughthe traumatic process of making classical analysis rigorous; and for most of hislifetime there were important branches of pure mathematics based more onplausibility than on certainty. But if such a doctrine was ever feasible it iscertainly no longer so. The final death-blow to it may well have been the

Phil. Trans. R. Soc. A (2005) 363, 2437–2447

doi:10.1098/rsta.2005.1658

Published online 12 September 2005

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof’.

2437 q 2005 The Royal Society

classification of the finite simple groups. But until the advent of computers, mostpure mathematicians had no difficulty with the concept of a rigorous proof. Toquote A. E. Houseman in a rather different context: ‘A terrier may not be able todefine a rat, but a terrier knows a rat when he sees it.’

Not all mathematical statements are theorems. In most branches of puremathematics there is a border-zone between what is rigorously established andwhat is totally mysterious. That zone is populated by what are variously calledconjectures, hypotheses and open questions. If one were asked for criteria whichjustify making a particular conjecture, one might say that it must satisfy one ormore of the following conditions:

(i) It sheds new light on the structure of the subject.(ii) Its statement is simple and fascinating.(iii) It is plausible and enables mathematicians to prove important results

which they are currently unable to prove without it.(iv) It can be shown to hold in particular (but preferably typical) cases.

These reasons are listed in what seems to me the order of decreasing merit. Allbut the second of them also tend to support the truth of the conjecture. Innumber theory the border-zone is particularly rich, and I shall take my examplesfrom there.

Two of the Clay Institute’s Million Dollar problems fall within number theory.The original Riemann Hypothesis was that the non-trivial zeroes of the Riemannzeta function z(s) all lie on the line RsZ1=2. Riemann’s reasons for believing hisHypothesis (of which a good account can be found in Siegel (1932)) weresophisticated, and he probably had no computational evidence for it.Subsequently, more than a billion zeroes have been computed (mostly fundedby the Pentagon), and they all lie on the critical line; but there are strong reasonsfor believing that even if counterexamples to the Riemann Hypothesis exist theywill be rare and will have very large imaginary parts, so computers cannotprovide strong evidence for it. Littlewood, indeed, believed that the RiemannHypothesis was false, on the grounds that if it were true the combined efforts ofclassical analysts would have proved it long ago. But in my view this is to see itin the wrong context. The Riemann Hypothesis has been repeatedly generalized,and the more far-reaching the generalizations the more central they appear to beto the structure of modern number theory. Thus, the Riemann Hypothesis oughtnot to be regarded as lying within classical analysis, and one ought not to hold itagainst classical analysts that they have not yet provided a proof of it.

In its simplest form, the Birch/Swinnerton-Dyer conjecture for an ellipticcurve relates the value of the associated L-series at sZ1/2 (the mid-point of thecritical strip) to the order of the Tate-Safarevic group of the curve. (The clearestdetailed description of the conjecture can be found in Tate (1966).) As Tate said,‘it relates the value of a function at a point where it is not known to exist to theorder of a group which is not known to be finite’. Nevertheless, even when theconjecture was first formulated it was possible to provide numerical evidence insupport of it in particular cases—primarily when the elliptic curve is defined overQ and admits complex multiplication, for in that case the L-series can beanalytically continued and can be explicitly evaluated at sZ1/2. Moreover, if theTate-Safarevic group is finite its order is known to be a square; and even forty

P. Swinnerton-Dyer2438

Phil. Trans. R. Soc. A (2005)

years ago its p -component could be evaluated in principle for any prime p andnearly always in practice for pZ2. Over the last forty years a lot more numericalevidence has been obtained, and special cases of the conjecture have beenproved—in contrast with the Riemann Hypothesis, which still appears absolutelyimpregnable. It has also been vastly generalized, though it is not clear to me thatthese generalizations are supported by any additional evidence.

Fermat’s Last Theorem fell into the category of conjectures until the work ofWiles. I am sure that Fermat believed he had proved it; and indeed one can withfair confidence reconstruct his argument, including one vital but illegitimatestep. It satisfies the second of my four criteria, but none of the others, and it hasnot fascinated everybody. Gauss, when asked why he had never attempted toprove it, replied that he could set up a hundred such statements, which could beneither proved nor disproved and which served only to impede the progress ofmathematics. But it was the attempt to prove Fermat’s Last Theorem whichmotivated Kummer to create algebraic number theory—a rich garden to growfrom a single seed.

I should also mention a conjecture which turned out to be false and whichsatisfied none of my criteria, but from which important developments sprang. Ithas been known since Dirichlet that a quadratic equation defined over Q issoluble in Q if it is soluble in each completion Qp and R. (The correspondingresult over a general algebraic number field, which is much harder to prove, isdue to Hasse; so for any family of equations a result of this kind is known as aHasse Principle.) Mordell conjectured that the corresponding result would holdfor the equation of a non-singular cubic surface. He gave no reason for this, and Isuspect that he put forward the conjecture partly for probabilistic reasons andpartly because he could think of no other obstruction. The first counterexampledepended on the sheer cussedness of small integers and threw no light on thenature of a possible obstruction in general. The second one was provided byCassels & Guy (1966). It depended on a computer search, extensive by thestandards of the time, which generated a list of diagonal cubic equations, whichhad no small integer solutions; but the proof that the simplest equation in thislist was actually insoluble did not involve a computer. These counterexamplesled Manin to discover the Brauer-Manin obstruction, which plays a central rolein the modern theory of Diophantine equations.

I could go on. But I hope that I have done enough to demonstrate two things:first, that at least in some branches of pure mathematics the formulation of well-justified conjectures plays an important role in advancing the subject and second,that there is general agreement what ‘well-justified’ means in this context.

Both for theorems and for conjectures, one should make a distinction betweenstructural statements such as the Riemann Hypothesis and accidental statementssuch as Goldbach’s conjecture. This distinction is not clear-cut; there would bedisagreement, for example, about the description of the Four Colour theorem orthe classification of finite simple groups. (I regard the latter as accidental,because there are so many sporadic simple groups and they are so diverse.) Mostmathematicians are resigned to the likelihood that the proofs of some accidentaltheorems may sometimes be long, turgid and ill-motivated; but they expect thatthe proof of a structural theorem, even if it is long and difficult, will be in somesense straightforward.

2439The justification of mathematical statements

Phil. Trans. R. Soc. A (2005)

The situation with differential equations is very different. It is true that thereare theorems about differential equations which have been rigorously proved, butthese tend not to answer the questions which users of differential equationsactually ask. The typical situation is as follows: Consider some interesting real-world system. It is in principle possible to write down the differential equationswhich describe how the system varies with time, according to the laws of natureas currently understood (and ignoring the effects of the Uncertainty Principle);but these equations will be far too complicated to use as they stand. Onetherefore needs to make radical simplifications, hoping that the solutions of thesimplified model will still describe to a good approximation the behaviour of theoriginal system. Currently, this process seems to be a matter of pure faith; but forsome systems there may be scope for a rigorous treatment. For example, in theMillion Body Problem, which studies the interaction of the stars in a galaxy, thestars are treated as point masses satisfying the Newtonian laws of gravitation,though each star does gradually radiate away some of its mass. Again, Nosemanaged to reduce the thermodynamics of the universe to three first orderordinary differential equations.

To this simplified system one applies whatever tools seem appropriate. For theNose equations (or the better known and more studied Lorenz equations, whichare another system of three first order equations derived in much the same way)these are of three kinds:

(i) Genuinely rigorous arguments.(ii) Arguments whose conclusion takes the form ‘Such-and-such happens

unless there is a rather unlikely-looking coincidence.’(iii) Information about particular trajectories, obtained numerically.

These will not be enough to determine the behaviour of the system evenqualitatively; but among the possible qualitative descriptions compatible withthe information obtained there will usually be a simplest one—and an appeal toOckham’s Razor should lead us to adopt that description. This process must beregarded as a justification of the conclusion rather than a proof of it; but fordifferential equations there seems little prospect of ever being able to do better.

So far I have mentioned computers only peripherally. I must now turn to theissues raised by computer-based proofs; and here it is necessarily to take accountof the fallibility both of computers and of programmers. Most computersprobably have bugs even in their hardware. (One early fixed-point machineevaluated (K1)!(K1) as 0 owing to a very plausible design fault; fortunatelythat was an operation not often performed.) Probably all computers have bugs intheir software—their operating systems, assemblers and compilers—though ifthese have not been detected and cured that implies that they very seldom causetrouble. More importantly, the process of turning an algorithm into a computerprogram is notoriously fallible even when the program is in a high-level language.Moreover, although it is feasible for a referee or reader to check an algorithm forerrors, it is almost impossible to check that someone else’s program is correct.(Some computer scientists claim that there exist programs which will rigorouslycheck whether an algorithm has been correctly translated into a program, andthere do exist similar programs which check that the design of a microchipcorrectly implements its specification. But no one has yet used these methods to

P. Swinnerton-Dyer2440

Phil. Trans. R. Soc. A (2005)

check the correctness of any of the existing programs for proving the Four Colourtheorem, and this can hardly be because computer scientists do not think thattheorem important enough.) All this needs to be taken into account in decidingthe level of credibility of a computer-based or computer-assisted proof.

That last phrase covers a considerable diversity of computer involvements,and they need to be separated. Let me start with what might be called thecomputer’s role as conjurer’s assistant. When you meet the word ‘Consider’ in aproof, you know that a rabbit is about to be pulled out of a hat; but you areunlikely to be told, nor logically do you need to be told, where the rabbit camefrom. We have already had one example of this. When Cassels thought of amethod, which might prove that some pre-assigned diagonal cubic equation wasa counterexample to the Hasse Principle he needed an equation for which themethod might work; and the only way of finding such an equation was by acomputerized search. Again, it is known that the set of rational points on anelliptic curve defined over Q form a finitely generated abelian group. Its torsionpart is easy to compute, so what is of interest is to find its rank. For any givencurve, there is a standard method for finding an upper bound r for this rank; andempirically the upper bound thus obtained is usually the actual rank. To provethis for a particular curve, we have to find r independent rational points on thecurve, and this is done by means of a search program. Once such points havebeen obtained, it is a relatively simple task to prove that they lie on the curve—and even if a computer is used to check this, it is most unlikely to report that agiven point lies on the curve when in fact it does not.

The canonical structure of a proof, as exemplified in that unreadable tour-de-force Russell and Whitehead’s Principia Mathematica, is as follows. One startswith a finite collection of well-established statements. At each step one writesdown one more statement, which must be an immediate logical consequence ofthe existing statements. Eventually one generates in this way the result whichone is trying to prove. There is some resemblance here to a chess-playingprogram, in that there is an enormous choice at each step and it is essential tohave a good evaluation function to tell one which is the most helpful nextstatement to adjoin. To the extent that this scheme could ever be made to workon a computer, the process would generate proofs which could be checked by aflesh-and-blood mathematician—or indeed by a proof-checking program. Thelatter would of course be far easier to write than a theorem-proving programbecause it would not need the evaluation subroutine, which is where (as with achess-playing program) the fundamental difficulties occur. I myself do not believethat a theorem-proving program of this kind will ever prove theorems which arebeyond the capacity of a human being. But the ideas which it would need couldbe applied, in a much simplified form, to a program to referee papers written byhuman beings; for such papers contain small gaps in the logic which the reader isexpected to fill, and filling such gaps is theorem-proving of a very elementarykind. Any editor will bear witness to the need for such a program.

So far I have been dealing with cases where a computer is essential or at leastuseful in generating a proof, but is not needed in the proof itself—in other words,proofs which are computer-assisted rather than computer-based. I now turn tothose which are genuinely computer-based. Let me give two examples. The firstis the Four Colour theorem, for which two computer-based proofs havealready been constructed, of which at least one is generally accepted as valid.

2441The justification of mathematical statements

Phil. Trans. R. Soc. A (2005)

(For an account of the second proof, with many references, see Robertson et al.(1997); a summary of this paper is also available on the Web.) The second is aconjecture which has not yet been proved, largely because it belongs to a branchof mathematics which is not currently fashionable; but it is well within reach ofmodern desk-top computers and it illustrates the points I want to make. Recallthat a lattice L in Euclidean space is said to be admissible for an open set R if nopoint of L except the origin O lies in R. Then the assertion is that every latticeadmissible for the region jX1X2X3X4j!1 has determinant at least

ffiffiffiffiffiffiffiffi

725p

. (This isbest possible if true, for the lattice L0 of integers of the totally real quartic field ofdiscriminant 725 is certainly admissible.)

As these two examples show, a large part of a computer-based proof may bedevoted to vulgar numerical calculation, but this will not always be so. Such apart presents few difficulties for checking correctness. Calculation with integers isexact, though calculation with real numbers is not. In the latter case one musttake account of round-off errors, and this requires working with inequalitiesrather than with equalities. Where serious difficulties do occur is if processes fromnumerical analysis are involved: it is, for example, almost impossible to generatebounds for the solution of a differential equation which are both tight andrigorous. This is a further reason for what I said earlier, that in the study ofdifferential equations one must accept much lower standards of justification thanin most of pure mathematics.

In a simplified form, the algorithm which is expected to prove the latticeassertion above is as follows. We look for all admissible lattices L which satisfysay det L!27. By standard theory, it is enough to consider lattices whichcontain the point P1Z(1,1,1,1). No admissible point, and hence in particular nopoint of the lattice other than O, is a distance less than 2 from O; so standardtheory provides an explicit constant C such that there are lattice points P2, P3,P4 within a distance C of the origin which together with P1 generate the lattice.We can think of the lattice as described by the point LZP2!P3!P4 in 12dimensions, and information about the lattice is equivalent to information aboutthe set in which L lies. The admissible region jX1X2X3X4jR1 is the union of 16disjoint convex subregions, according to the signs of the Xi; we initially splitcases according to which region each of the Pj lies in. Some of these cases can beimmediately shown to be impossible: for example, if all the coordinates of P2 arepositive then it turns out that P1KP2 cannot be admissible. More generally, forany particular case we choose integers n1, . ,n4 not all zero and consider thelattice point PZSnjPj. (The design of an efficient algorithm for choosing the nj isthe one sophisticated part of the program.) There are now three possibilities:

(i) P cannot lie in any of the 16 admissible subregions; if so, this case can bedeleted.

(ii) There is exactly one subregion in which P can lie; if so, this is a constrainton P and, therefore, reduces the set in which L can lie. We can nowcontinue the process with a new choice of the nj.

(iii) There is more than one subregion in which P can lie; if so, we split thiscase into subcases according to the subregion in which P is assumed to lie.

Thus the process keeps on deleting old members from the list of cases to bestudied but also putting new ones in. What we hope is that eventually the list

P. Swinnerton-Dyer2442

Phil. Trans. R. Soc. A (2005)

reduces to a single case and for that case the open region containing L is smalland contains the point L0 corresponding to the conjectured critical lattice L0;if so, we can complete the proof by means of a known isolation theorem. If thisdoes not happen, in due course we obtain a list of very small regions in one ofwhich L must lie; and provided the process is error-free we expect each of theseregions to provide an admissible lattice which can be found by hand.

The algorithm which underlies the proof of the Four Colour theorem fits thesame pattern—and indeed this appears to be the natural pattern for a long andcomplicated computer-based proof. Here too the proof starts with a finite list ofcases, and when any particular case is processed it is either deleted or replaced bya finite number of subcases. For if the theorem is false, among the maps whichcannot be coloured with only four colours there is one which contains thesmallest number of regions. The list of cases is a list of sub-maps which mightform part of this map. A case can be split by adjoining an extra region to the sub-map, which can be done in various ways. A case can be rejected if there is adifferent sub-map having fewer regions such that if the old map cannot becoloured with only four colours, then nor can the new map obtained by replacingthe old sub-map by the new one. (Fortunately, this is a property which can oftenbe established without knowing anything about the rest of the map.) The proofsucceeds if the list can be exhausted.

The principle underlying such proofs is attributed to Sherlock Holmes: ‘Whenyou have eliminated the impossible, whatever remains, however improbable, mustbe the truth.’ The point which I wish to make about computer-based proofs of thiskind is as follows. Suppose that there are errors in the program, but the programdoes in fact terminate; since thatwas the resultwhichwewere expecting,wehavenoreason todoubt the correctness of theprogram—forprogrammingerrors areusuallyonly detected because they have led to surprising results.Moreover, in a program ofthis kind an error is likely to lead either to some cases being wrongly rejected or tosome cases never being generated by the splitting process. Either of these will makethe program terminate sooner than it should have done, or even when it should nothave terminated at all. In other words, errors will usually generate false proofsrather than merely failing to generate true proofs. It is this which makes validationof this kind of proof so important.

More than thirty years ago I stated what I thought was needed to validate acomputer-based proof, within the limits of practicality; and I see no reason tochange my views now. (I was heartened to discover at this conference thatAnnals of Mathematics has been forced to adopt a very similar attitude.)Suppose that Alice has produced a computer-based proof and wishes Bob tovalidate it; what should each of them do?

Alice should publish the algorithm which underlies the program, in so simple aform that other people (and in particular Bob) can check it. She should be verycautious about including in the algorithm the sort of gimmicks which make theprogram more efficient, because they also make the correctness of the algorithmharder to check. It is highly desirable, if possible, that the algorithm should alsospecify some intermediate output. Alice should not at this stage provide Bobwith any other information; in particular she should not give Bob a copy of herprogram or any part of it, nor a copy of her intermediate output. Ideally, Bobshould not even come from the same environment as Alice, because that wouldtend to give them a common mind-set. Bob should then turn the algorithm into

2443The justification of mathematical statements

Phil. Trans. R. Soc. A (2005)

a program, preferably not using the same language as the one which Alice used. Ifboth programs yield the same results, including the same intermediate output,this is as much validation as can reasonably be provided.

Finally, a more general point. Manichaeans hold that power over the universeis equally divided between God and the Devil. At least until Godel,mathematicians believed that their subject lay entirely within God’s share. Itis my impression that most of the speakers at this conference still hold this view,even though much of what they have said points in the opposite direction. Thedoctrine is well illustrated by two couplets written nearly three centuries apart,the second being written as an answer to the first:

Nature, and Nature’s laws, lay hid in night;God said ‘Let Newton be! ’ and all was light. But not for long; the Devil, shouting ‘Ho!Let Einstein be! ’ restored the status quo.

Appendix A

This appendix provides some further information about some of the topicsmentioned in the body of the talk.

(i) The most general version of the Riemann Hypothesis which I know is asfollows. Let f (s) be a Dirichlet series satisfying the following conditions:(a) It occurs naturally in a number-theoretic context.(b) It has an Euler product.(c) It can be analytically continued as a meromorphic function over the

whole s-plane, and satisfies a functional equation which relates f (s)and f (nKs) for some integer s and which up to sign is tantamount toa symmetry law.

Then all the non-trivial zeroes of f (s) lie on the critical line RsZ ð1=2Þn.This as it stands appears to contain an escape clause, in that the firstcondition is metamathematical rather than mathematical. But inpractice there would be little disagreement whether a purportedcounterexample satisfied that condition or not.

(ii) Fermat’s last theorem asserts that if nO2 there are no solutions inpositive integers of XnCYnZZn. Gauss clearly regarded it as anaccidental rather than a structural theorem; but the heart of Wiles’sproof is a proof of a weak form of the modularity conjecture, which iscertainly a structural theorem.

(iii) The modularity conjecture (over the attribution of which controversyrages) states that each elliptic curve defined over Q can be parametrizedby modular functions. The first assertion of this kind is due to Tamagawa,who died young. The first substantial justification of it was given by Weil(1967), though he stated it only as an open question. The first substantialnumerical evidence for it was given by Birch and his students. It has nowbeen completely proved.

(iv) Goldbach’s conjecture is that every positive even integer other than 2is the sum of two primes. It has been proved for all even integers up to6!1016. This is a case in which a purported proof of the full conjecturewould deserve very careful checking, but the proof of the weakerstatement in the previous sentence deserves rather little.

P. Swinnerton-Dyer2444

Phil. Trans. R. Soc. A (2005)

(v) The Nose equations are

_x ZKyKxz; _y Z x; _z Zaðx2K1Þ;

where a is a positive parameter. There are certainly values of a for whichthe behaviour of the trajectories is chaotic both in the usual and in thetechnical sense; whether this is so for all values of aO0 is not known. TheLorenz equations are

_x ZsðyKxÞ; _y Z rxKyKxz; _z Z xyKbz;

where s, r and b are three real positive parameters. A good introduction totheir study can be found in Sparrow (1982).

(vi) William of Ockham (or Occam) was a medieval theologian andphilosopher. He stated the principle that ‘entities should not be multipliedwithout cause’, which is known as Ockham’s Razor. A reasonableparaphrase would be that one should accept the simplest explanation ofany phenomenon.

(vii) The simplest base for the rational points on an elliptic curve usuallyconsists of points with numerator and denominator comparable with thecoefficients in the equation of the curve; but occasionally this fails badly.For example, the group of rational points on the curve y2Zx3K673 hasrank 2, and the simplest generators are the points with

x Z 29 and x Z3398323537

617612:

A large table of ranks and generators can be found in Cremona (1997).

References

Cassels, J. W. S. & Guy, M. J. T. 1966 On the Hasse principle for cubic surfaces. Matematika 13,111–120.

Cremona, J. E. 1997 Algorithms for modular elliptic curves, 2nd edn. Cambridge: CambridgeUniversity Press.

Robertson, N., Sanders, D. P., Seymour, P. D. & Thomas, R. 1997 The four colour theorem.J. Comb. Theory Ser. B 70, 2–44. (doi:10.1006/jctb.1997.1750)

Siegel, C. L. 1932 Uber Riemanns Nachlass zur analytischen Zahlentheorie. Quellen und Studienzur Geschichte der Mathematik, Astronomie, Physik 2, 45–80. (Gesammelte Abhandlungen, 1,275–310).

Sparrow, C. 1982 The Lorenz equations, bifurcations, chaos and strange attractors. Berlin:Springer.

Tate, J. 1966 On the conjectures of Birch and Swinnerton-Dyer and a geometric analog. Sem.Bourbaki 306.

Weil, A. 1967 Uber die Bestimmung Dirichletscher Reihen durch Funktionalgleichungen. Math.Ann. 168, 149–156. (doi:10.1007/BF01361551) (Collected Works, 3, 165–172).

Discussion

C. JONES (Computing Science Department, University of Newcastle, UK ). Theview that one might prefer to construct a second program (rather than study acarefully annotated one) is odd. It could be compared to a journal, which only

2445The justification of mathematical statements

Phil. Trans. R. Soc. A (2005)

sends the statement of a new theorem to referees asking them to provide theirown proofs. This might uncover errors but would be rather wasteful! Theassertions in a program provide a rigorous argument of its correctness; or carefuldevelopment using for example data abstraction is even more like the (rigorous)proof of a theorem.

R. D. ARTHAN (Lemmal Ltd, Berkshire, UK ). Direct evaluation of programswithin a theorem-proving environment such as HCL offers a good half wayhouse between relying on an untrusted program and formal programverification. This has been used with some success by John Morrison andothers giving validated calculations with the real numbers. Can you commenton this?

P. SWINNERTON-DYER. The question is what degree of credibility should attachto a theorem none of whose proofs conform to classical standards. Thisquestion is usually asked about proofs which depend on computer programs(as in these two questions), and this answer will deal only with those. Even ifa computer program in a high-level language is itself correct, the resultsobtained by running it may be vitiated by undetected bugs in the compiler,the operating system or even the hardware—or indeed by viruses temporarilypresent in the computer being used. (Few if any compilers or operatingsystems are without bugs; and in this paper, I gave an example of a hardwareerror in an important computer, which to my knowledge went undetected foryears.) To reduce these dangers it is reasonable to insist that the programshould be run twice, on essentially different computers using essentiallydifferent compilers. This does not quite meet classical standards; but it is avery modest requirement, and gives rather strong assurance that the programdid do what it says it does.

But does the program do what the programmer thinks it does, and how does themathematical community obtain reasonable assurance of this? The suggestionthat the reader can actually check the correctness of a complicated publishedprogram is ludicrous; indeed, I doubt if there is anyone alive who is both willingand able to do this with a high degree of reliability for the sort of programs whichgave rise to this meeting. (The difficulty is not only with fundamental errors, andindeed these are usually eradicated in the course of writing and checking theprogram. But slips of the pen, of a kind which also occur in published classicalproofs but do little damage there, can short-cut some branches of the program;and the result is apt to be that not all possibilities have been investigated.)Working within a theorem-proving environment, even when this is feasible, doesadd to the credibility of a program; but for a program used in the proof of animportant or unexpected theorem, the mathematical community will probablynot feel that the credibility which it adds is enough. Formal program verificationis not at present capable of dealing with programs as complicated as those whichwe are discussing in this meeting, and I am not confident that it ever will be.

N. SHAH (Durham, UK ). You have raised an important point, ‘theorems’ if proofsare submitted to journal, e.g. 2nd ODE where there are no closed forms. I wouldurge the mathematical community to make sure that mathematical softwarecurrently used has been proved correct otherwise theorems will be published butbecause of bugs these theorem are non repeatable.

P. Swinnerton-Dyer2446

Phil. Trans. R. Soc. A (2005)

P. SWINNERTON-DYER. Theory does provide methods for computing provablebounds for the solution of a given ordinary differential equation, but I do notknow of any satisfactory implementation of any of these methods as a librarysubroutine. If one solves an ordinary differential equation by standard numericalmethods, it is not hard to build in extra equations whose solutions are errorestimates; these will not be provably correct, but in practice usually are correct.(As I implied in my talk, this is an area in which one must take a much morerelaxed attitude to provable correctness than in pure mathematics.) Inparticular, over an interval in which the solution is stable standard subroutinesare good enough.

For partial differential equations the situation is much less good. But thelimitation here arises from the unsatisfactory state of the theory, and this needsto be improved before one is entitled to start complaining about anyshortcomings in the software.

E. B. DAVIES (Department of Mathematics, King’s College London, UK ). Whencold fusion was ‘discovered’ there was immediate public criticism that the effectwas entirely implausible. However, a number of laboratories set up experimentsto try to duplicate the findings. In experimental science an effect is not believeduntil it has been confirmed independently. The mathematical community shouldfollow the same procedure with respect to computer assisted proofs by requiringindependent programmes to be written.

P. SWINNERTON-DYER. I entirely agree. But there is one flaw in the analogy. Coldfusion was always implausible, and I am sure that most of the laboratories whichtried to ‘duplicate’ it were actually trying to refute it. But all the computerassisted proofs which I know of are proofs of results which everyone in the areabelieved to be true long before any proof was announced; and not a great deal ofcredit is given for producing the second proof of such a result unless that secondproof differs fundamentally from the first one.

2447The justification of mathematical statements

Phil. Trans. R. Soc. A (2005)

Pluralism in mathematics

BY E. B. DAVIES

Department of Mathematics, King’s College, Strand, London WC2R 2LS, UK([email protected])

We defend pluralism in mathematics, and in particular Errett Bishop’s constructiveapproach to mathematics, on pragmatic grounds, avoiding the philosophical issues whichhave dissuaded many mathematicians from taking it seriously. We also explain thecomputational value of interval arithmetic.

Keywords: pluralism; constructive mathematics; interval arithmetic

1. Introduction

Errett Bishop’s book ‘Foundations of Constructive Analysis’ appeared in 1967and started a new era in the development of constructive mathematics. Hisaccount of the subject was entirely different from, and far more systematic than,Brouwer’s programme of intuitionistic mathematics. The latter attracted a fewadherents in the 1920s and 1930s, but was widely rejected because of its conflictswith the dominant classical view of the subject.

Unfortunately, Bishop’s book was ignored by most mathematicians, whoassumed that the issues involved had all been settled, and that he could not haveanything interesting to say. My task in this meeting is to try to persuade youthat his programme provides valuable insights into matters which should be ofconcern to anyone who has even indirect involvement in computation.

In this paper I will not discuss the philosophical issues relating to Bishop’swork, which are treated at some length in Billinge (2003) and Davies (2004),beyond saying that one can admire his mathematical contributions withoutadopting his philosophical position. Briefly, I defend what I call pluralism inmathematics—the view that classical mathematics, constructive mathematics,computer assisted mathematics and various forms of finitistic mathematics cancoexist. I revive Carnap’s dictum that one must decide the framework ofdiscourse before questions about existence and truth make sense; see Carnap(1950). In different frameworks the answer to a question may be different, butthis in no way implies that one or the other is ‘right’. This position is anti-Platonistic.

From chapter 2 onwards Bishop (1967) is completely recognizable as rigorouspure mathematics. Many well-known theorems appear, sometimes in formswhich are not the usual ones, although trivially equivalent to them from aclassical point of view. A few theorems are simply absent. The value of Bishop’sefforts may not be immediately clear to everyone, in spite of what he writes in

Phil. Trans. R. Soc. A (2005) 363, 2449–2460

doi:10.1098/rsta.2005.1657

Published online 12 September 2005

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof’.

2449 q 2005 The Royal Society

his first chapter. I will show that the differences between classical andconstructive mathematics always correspond to situations in which realdifficulties arise in numerical computation for suitable examples. Classicalmathematics may provide a more flexible context for proving the existence ofentities, but constructive mathematics provides a systematic approach tounderstanding why computational solutions of problems are sometimes not easyto obtain.

This is not to say that constructive mathematics is simply a part of numericalanalysis. Numerical analysts have more problems than do constructivists. Bishop(1967, p. 140) gives a straightforward proof of the fundamental theorem ofalgebra, with one small but interesting proviso, even though it is known that thezeros of a polynomial of moderate degree (say at least 50) are highly unstablewith respect to very small changes in the coefficients. Nevertheless, one can gainmany insights into the differences between classical and constructive mathemat-ics by considering numerical examples.

2. What is constructive mathematics?

It has often been said that Bishop rejected the law of the excluded middle, buta more useful description of the situation is that he gave the symbol d adifferent meaning from the usual one. In classical mathematics d refers toPlatonic existence, but Bishop used it to refer to the production of analgorithm for constructing the relevant quantity. In classical mathematics dmay be defined in terms of c: the expression dxA is logically equivalent tol(calA). In constructive mathematics, d is a new quantifier with stricterconditions for its application. All of the differences between classical andconstructive mathematics follow from the new meaning assigned to thesymbol. We wish to emphasize that every theorem in Bishop’s constructivemathematics is also a theorem in classical mathematics. Constructivemathematicians have to work harder to prove theorems, because their criteriafor existence are stricter; the pay-off is that the statements of the theoremscontain more information.

Starting from the natural numbers, Bishop constructed the real numbersystem and established many of its familiar properties. However, in his approach,one cannot assert for every real x that either xZ0 or xs0. Nor is it the case thatevery bounded set of real numbers has a least upper bound. The reason for this isbest illustrated with an example. For each positive integer n, we define thenumber an to be 1 if the nth digit in the decimal expansion of p is the start of asequence of a thousand consecutive sevens; otherwise we put anZ0. For each n,the value of an can be determined within a finite length of time. However, theleast upper bound A of the sequence is currently unknown. Platonically speakingeither AZ0 or AZ1, even if we do not know which is the case, but Bishop wouldsay that such a statement has no content: it simply reformulates the question in atrivial manner.

If we put

sZXNnZ1

anðK1=3Þn;

E. B. Davies2450

Phil. Trans. R. Soc. A (2005)

then s exists constructively, because it can be calculated with any assignedaccuracy in a finite length of time. However, we do not know whether or not sZ0.Even if it were eventually proved that every sequence of digits occurs somewherein the decimal expansion of p, so that ss0, whether s is positive or negativeseems to be far beyond reach. The development of constructive mathematics hasto reflect the existence of an infinite number of questions of the same type.

It is very surprising that Bishop could develop so much of analysis withoutusing the least upper bound principle or the law of the excluded middle. His bookcontains fully worked out accounts of the theory of Banach spaces, the spectraltheorem, integration theory and Haar measure, among other topics. Set theory issubstantially different from the classical version, particularly when consideringthe complement of a subset. Point set topology is not easy to develop (seeBridges & Luminita 2003 for one approach to this), but Bishop contains a richconstructive version of the theory of metric spaces. Compactness is defined inBishop (1967, p. 88) using the notion of total boundedness, which is the waycompactness is often proved in applications. This is not a coincidence.

3. What significance do the differences have?

My goal in this section is to explain why a classical mathematician (and Iemphasize that I am one) might be interested in Bishop’s programme. My thesisis that by adopting the constructive framework one can handle certain numericalproblems on a systematic basis, whereas classical mathematicians have to dealwith them piecemeal, and try to remember whether or when they are usingresults which are not computationally feasible. I make no claim that constructivemathematics is superior to classical mathematics in all contexts, but only that itsometimes provides illuminating insights.

Producing examples to demonstrate the differences between classical andconstructive mathematics often exploits the difference between recursive andrecursively enumerable subsets of N . Another method is to define a sequencewhose behaviour as n/N depends upon whether some famous conjecture is trueor false.1 We adopt a third strategy, showing that the impossibility of provingsomething in constructive mathematics is regularly associated with the extremedifficulty of showing it numerically for quite ordinary functions. We emphasizethat the functions considered below do not, strictly speaking, provide examplesof the constructive phenomenon, but we feel that in spite of this they explain whythe constructive phenomenon exists.

Let us start with the intermediate value theorem for continuous functions of asingle real variable. Bishop (1967, p. 5) explains why this theorem cannot beproved in a constructive framework. In the context of constructive mathematicsone cannot find a value of the variable x for which f (x)Zc by the method ofbisection, because being able to evaluate f (x)Kc with arbitrary accuracy doesnot imply that one can determine whether it is positive or negative.

Slight modifications of the intermediate value theorem are, however, validconstructively. If f : ½a; b�/R is continuous and f (a)!c!f (b) then, given 3O0,

1 If one uses the Goldbach conjecture, for example, then one puts anZ0, if 2n may be written as thesum of two primes, and anZ1 if it cannot.

2451Pluralism in mathematics

Phil. Trans. R. Soc. A (2005)

one can constructively find x2(a, b) such that jf (x)Kcj!3. In addition theintermediate value theorem itself may be proved constructively under a mildextra condition on f (being locally non-constant), which is almost alwayssatisfied. See Bishop & Bridges (1985) and Bridges (1998).

We explain the differences between the classical and constructive versions ofthe intermediate value theorem by means of two examples, one ersatz and onegenuine. Let

f ðxÞZ logð1Cx45Þ; ð3:1Þ

on (K1, 1) (figure 1). Given the formula, it is evident that the only solution off (x)Z0 is xZ0, but one would need to calculate to very high accuracy todetermine from its numerical values that the function is not identically zerothroughout the interval (K1/4, 1/4). However, many digits one uses in thenumerical computation, a similar difficulty arises if one replaces 45 by a largernumber. In applied situations a closed formula is frequently not available, andthe above problem may be a pressing one.

A genuine example presenting exactly the same difficulties is obtained asfollows. Let

gðxÞZ

xC1 if x%K1;

0 if K1!x!1;

xK1 if 1%x:

8><>:

Then one cannot constructively solve g(x)Zc if c is an extremely small numberfor which one does not know whether cZ0, c!0 or cO0.

–1.0 –0.5 0 0.5 1.0 1.5–5

0

5

10

Figure 1. Graph of the function f (x) defined by equation (3.1).

E. B. Davies2452

Phil. Trans. R. Soc. A (2005)

In constructive mathematics every non-negative continuous function on aclosed bounded interval has a non-negative infimum; Bishop (1967, p. 35)provides a procedure for computing this with arbitrary accuracy. This does notimply that one can always determine whether the minimum is zero or positive,nor does it imply that one can find a point at which the infimum is achieved.Both can be a serious problem in numerical analysis as well. If 3O0 is sufficientlysmall it is difficult to show purely numerically that the polynomial

pðxÞZ x4K2p2x2 Cp4 C3ðxK2Þ2; ð3:2Þ

never vanishes, and also difficult to determine whether its minimum value occursnear xZp or xZKp. For functions arising in applied mathematics that are notgiven by explicit formulae this can again be a serious problem (figure 2).

The suggestion that classical mathematics takes priority over constructivemathematics because of the Putnam–Quine argument about the indispensabilityof the former in the physical sciences are not convincing, for reasons spelled outin Davies (2003a,b). The differences between the two systems are unimportantfrom the point of view of physicists; this is why Newton, Laplace, Maxwell andother scientists were able to develop very successful mathematically basedtheories long before the real number system was formalized at the end of thenineteenth century.

Hellman has tried to put some flesh on the Putnam–Quine argument in severalrecent papers, in which he claims that there are no constructive versions of somekey results in mathematical physics. We start with Hellman (1993a), which dealswith Gleason’s theorem, considered by some (but not the author of this paper) tobe of considerable importance in the foundations of quantum theory. Thisconcerns the (non-distributive) lattice L of closed subspaces of a Hilbert space H

–4 –3 –2 –1 0 1 2 3 40

10

20

30

40

50

60

70

80

90

100

Figure 2. Graph of the function p(x) defined by equation (3.2) with 3Z0.01.

2453Pluralism in mathematics

Phil. Trans. R. Soc. A (2005)

of dimension greater than two. In this lattice the analogues of set-theoreticcomplements are orthogonal complements. Gleason’s theorem states that if m is anormalized, countably additive measure on L in a suitable sense, then thereexists a non-negative, self-adjoint operator S on H with trace 1 such that

mðLÞZ traceðSPLÞ;

for all L2L, where PL is the orthogonal projection with range L. Hellmanshowed that a different version of Gleason’s theorem cannot be proved inconstructive mathematics. Nevertheless, Gleason’s original version of thetheorem, stated above, is constructively valid; see Richman & Bridges (1999)and Richman (2000). The difference between the two versions relates to thevalidity of the principal axes theorem, discussed below.

In Hellman (1993b), the author showed that one version of the spectraltheorem for unbounded self-adjoint operators is not constructively valid.However, in his original book Bishop (1967, p. 275) had already proved adifferent version, for a commuting sequence of bounded self-adjoint operators,which is completely acceptable even to classical mathematicians. After Hellman’spaper appeared, Ye (1998) published a constructive version of the spectraltheorem for an unbounded self-adjoint operator. Hellman’s focus on the issue ofdomains and unboundedness is misguided because an unbounded operatorbecomes bounded as soon as one makes its domain into a Banach space byendowing it with the graph norm

jkf jkZffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffikf k2 CkAf k2

q:

The difficulty of determining whether a vector lies in the domain of anunbounded self-adjoint operator is not just a problem for constructivists. Theclassical theory progressed much more rapidly after it was realized that it wassufficient to specify an explicit domain of essential self-adjointness (a so-called‘core’ of the operator) or even a core for the associated quadratic form; see Davies(1980). A considerable amount is now known about the spectral theory of wholeclasses of differential operators whose domains cannot be identified as standardfunction spaces; see, for example, Davies (1989).

The principal axes theorem is the classical result that every self-adjoint matrixhas a complete orthonormal set of eigenvectors. The theorem is notconstructively valid, for good reasons: the eigenvectors may change extremelyrapidly as a parameter passes smoothly through a critical value. We invite thereader to check this for the elementary example

As Z3 s

s K3

� �;

where 3Z10K100 and s passes through the value 0. The classical version of thespectral theorem provides no insight into the existence of these computationalproblems, and does not suggest how they might be overcome. One canunderstand such problems in classical terms, but the constructive approachprovides a systematic framework for doing so.

Finally in Hellman (1998) the author showed that the Hawking–Penrosesingularity theorem is not constructively valid. This is an interesting

E. B. Davies2454

Phil. Trans. R. Soc. A (2005)

observation, since the theorem has been extremely influential in the subject. It isembarrassing for a radical constructivist but not for a pluralist. It remainsextremely hard to say much about the nature of the singularities: one certainlycannot identify them as black holes without the benefit of a ‘cosmic censorshiphypothesis’. It is very likely that any detailed classification of the singularitieswill also be constructively valid.

The constructive version of the Hahn–Banach theorem in Bishop (1967,p. 263) applies to normable linear functionals, and Bishop needs to distinguishthese from the more general bounded linear functionals. The following exampleexplains why. Take the sequence a defined above using the digits of p and putbnZ1 if anZ1 and also amZ0 for all m!n; otherwise put bnZ0. The sequence bidentifies the first occurrence of a thousand consecutive sevens in the decimalexpansion of p, if such a sequence exists. Classically b2l 2(N ) but constructivelywe cannot assert this (in 2004), because we are not able to evaluate kbk2 witharbitrary accuracy. Even if we were assured that a sequence of a thousandconsecutive sevens existed, we would still not be able to place b in l 2(N )constructively unless we were given some information about how large n hadto be for bnZ1.2 Nevertheless the formula

fðcÞZXNnZ1

cnbn;

is constructively well defined for all c2l 2(N ) and defines a bounded linearfunctional f on l 2(N ).

A linear operator A on a Banach space B is said to be bounded if there exists aconstant c such that kAxk%ckxk for all x2B. Its norm is the smallest suchconstant if that exists. The constructive distinction between bounded andnormable operators is related to the fact that there is no effective classicalalgorithm for determining the norms of bounded operators on most Banachspaces. This is why finding the best constants for various Sobolev embeddings inLp spaces and other operators of importance in Fourier analysis has occupiedmathematicians for decades. The same problem occurs for large finite matrices—standard software packages only provide routines for computing the norm of asufficiently large n!n matrix with respect to the l p norm on Cn for pZ1, 2, N.

4. Computer-assisted mathematics

Over the last 30 years a number of famous problems in pure mathematics havebeen solved by methods that conform entirely to the standards of classicalmathematics, except for the fact that they involve a large amount ofcomputation.

Because it may not be well known to this audience, I will concentrate on theincreasing use of controlled numerical methods to prove completely rigorously the

2Taken literally this sentence is non-sensical. Classically, there is no problem in asserting thatb2l 2(N ). Constructively, one could not be given an assurance about the existence of n for whichbnZ1 without also being given the relevant information about how large n had to be. This is afamiliar problem when one tries to compare two incommensurate frameworks.

2455Pluralism in mathematics

Phil. Trans. R. Soc. A (2005)

existence of, and describe, the solutions of various problems in analysis. Intervalarithmetic provides a new version of finite arithmetic. It has an ancient history,but is slowly becoming more important in connection with computer-assistedproofs of theorems in analysis. See Moore (1979), Markov & Okumura (1999),Plum (2001), Plum & Wieners (2002), Breuer et al. (2003) and IntervalComputations Web-Site (2004) for some of the many applications. Its basicentities may be written in the form

x Z 1:24706304296 e3;

where one imposes some upper bound on the number of significant digits allowed.The interpretation of this expression is as the interval xZ ½

�x; �x� where

xZ1247.062 96 and �xZ1247:063 04, but the definitions of the basic operations ofarithmetic on the entities do not depend logically upon this intuition, nor upon anycommitment to the existence of the real number system. To add two entities oneadds the lower bounds and then rounds down to the prescribed number of digits,and also adds together the two upper bounds and rounds up to the prescribednumber of digits. If one ignores the rounding procedure then uCvZw where

�wZ

�uC

�v;

�w Z �uC �v:

The definition of multiplication is similar, but more complicated. One puts uvZwwhere

�wZminfuv ;

�u �v; �u

�v; uvg;

�w Zmaxfuv ;�u �v; �u

�v; uvg:

One identifies an integer n with the interval [n, n] and writes uwv if the twointervals overlap, i.e. if

maxf�u;

�vg%minf �u; �vg:

One puts xO0 if�xO0, and x!0 if �x!0; if neither of these holds then xw0. One

might define p in the system by pw3:141 592 6543, without commitment to theexistence of an ‘exact’ value. In interval arithmetic

ðxK1Þ2wx2K2xC1;

but the two are not equal. One needs to choose the right way of evaluating anexpression to minimize the width of the interval produced.

In interval arithmetic the standard functions such as sin, log, etc. takeintervals to intervals. The programming languages have to be fairly subtle toachieve this. For example when obtaining

sinð½1:57; 1:58�Þ4½0:999 957; 1�;

the programming language must take into account the fact that 1.57!p/2!1.58. There is no requirement for the initial interval to be small. Thus

cosð½0; 2�Þ4½K0:416 147; 1�:

A systematic description of interval arithmetic has been completed, andprogramming languages using it are readily available. It allows a rigorous

E. B. Davies2456

Phil. Trans. R. Soc. A (2005)

approach to global optimization and makes a small but vital contribution to thesolution of certain well-known, nonlinear, elliptic, partial differential equations.See Plum (2001), Plum & Wieners (2002), Breuer et al. (2003) and referencesthere. In these papers the authors start by searching for approximate solutionson an experimental basis. They then prove various analytical results whichestablish that one can use a contraction mapping principle to prove theexistence of true solutions close to the approximate solutions, provided certaininequalities hold. Finally, they verify the inequalities rigorously using intervalarithmetic.

It seems clear that this approach to nonlinear PDEs will expand rapidly overthe coming decades. Those of us who feel uneasy about computer assisted proofswill either have to come to terms with them, or assign an ever-increasing fractionof their subject to a new category. Computers are fallible in different ways frommathematicians, but both are subject to verification by a variety of methods. Inthe case of computer-assisted proofs these range from formal proofs of correctnessto the writing of independent programs, which are then run on machines withdifferent operating systems. Absolute certainty is a chimera in both contexts, asthe classification of the finite simple groups and the solution of Kepler’s spherepacking problem have shown.

Journal editors are struggling to come to terms with this situation. In theauthor’s opinion they should state in each problematical case exactly whatdegree of confidence they have in each part of a very complex proof. This willprovide the best information for future generations to assess whether they wishto rely upon the result and if not, which parts need further attention. Programsshould probably be archived for reference, if this is practically possible, but thechecking of badly written programs will never be as convincing as the productionof better ones.

References

Billinge, H. 2003 Did Bishop have a philosophy of mathematics? Phil. Math. 11, 176–194.Bishop, E. 1967 Foundations of constructive analysis. New York: McGraw-Hill.

Bishop, E. & Bridges, D. 1985 Constructive analysis. Grundlehren der mathematischen

Wissenschaft, vol. 279. Heidelberg: Springer.Breuer, B., McKenna, P. J. & Plum, M. 2003 Multiple solutions for a semilinear boundary value

problem: a computational multiplicity proof. J. Differ. Equations 195, 243–269. (doi:10.1016/

S0022-0396(03)00186-4.)

Bridges, D. 1998 Constructive truth in practice. In Truth in mathematics (ed. H. G. Dales &

G. Olivieri), pp. 53–69. Oxford: Clarendon Press.

Bridges, D. & Luminita, V. 2003 Apartness spaces as a framework for constructive topology. Ann.

Pure Appl. Logic 119, 61–83. (doi:10.1016/S0168-0072(02)00033-7.)Carnap, R. 1950 Empiricism, semantics, and ontology. Rev. Int. Phil. 4, 20–40. [See also

Supplement to ‘Meaning and necessity: a study in semantics and modal logic‘, enlarged edition.

Chicago: University of Chicago Press, 1956.]Davies, E. B. 1980 One-parameter semigroups. LMS Monographs, vol. 15. London: Academic

Press.

Davies, E. B. 1989 Heat kernels and spectral theory. Cambridge Tracts in Mathematics, vol. 92.

Cambridge: Cambridge University Press.Davies, E. B. 2003a Empiricism in arithmetic and analysis. Phil. Math. 11, 53–66.

2457Pluralism in mathematics

Phil. Trans. R. Soc. A (2005)

Davies, E. B. 2003b Quantum mechanics does not require the continuity of space. Stud. Hist. Phil.

Mod. Phys. 34, 319–328. (doi:10.1016/S1355-2198(03)00003-0.)

Davies, E. B. 2004 A defence of pluralism in mathematics. Preprint. Available at http://philsci-

archive.pitt.edu/archive/00001681.

Hellman, G. 1993a Gleason’s theorem is not constructively provable. J. Phil. Logic 22, 193–203.

(doi:10.1007/BF01049261.)

Hellman, G. 1993b Constructive mathematics and quantum mechanics: unbounded operators and

the spectral theorem. J. Phil. Logic 22, 221–248. (doi:10.1007/BF01049303.)

Hellman, G. 1998 Mathematical constructivism in space-time. Br. J. Phil. Sci. 49, 425–450. (doi:10.

1093/bjps/49.3.425.)

Interval Computations Web-Site 2004 http://www.cs.utep.edu/interval-comp/main.html.

Markov, S. & Okumura, K. 1999 The contribution of T. Sunaga to interval analysis and reliable

computing. In Developments in reliable computing (ed. T. Csendes), pp. 167–188. Dordrecht:

Kluwer.

Moore, R. E. 1979 Methods and applications of interval analysis. Philadelphia: SIAM.

Plum, M. 2001 Computer-assisted enclosure methods for elliptic differential equations. Lin. Alg.

Appl. 324, 147–187. (doi:10.1016/S0024-3795(00)00273-1.)

Plum, M. & Wieners, C. 2002 New solutions of the Gelfand problem. J. Math. Anal. Appl. 269,

588–606. (doi:10.1016/S0022-247X(02)00038-0.)

Richman, F. 2000 Gleason’s theorem has a constructive proof. J. Phil. Logic 29, 425–431. (doi:10.

1023/A:1004791723301.)

Richman, F. & Bridges, D. 1999 A constructive proof of Gleason’s theorem. J. Funct. Anal. 162,

287–312. (doi:10.1006/jfan.1998.3372.)

Ye, F. 1998 On Errett Bishop’s constructivism—some expositions, extensions and critiques. Ph. D.

thesis, Princeton University.

Discussion

D. B. A. EPSTEIN (Department of Mathematics, University of Warwick, UK). Wecould use the symbol (backwards capital E with outline and hollow inside)‘hollow exists’ for classical (backwards E), and (backwards E) for constructive(backwards E).

E. B. DAVIES. Yes, suggestions of this type have been made. Your proposal, withits judgemental overtones, is more likely to be received well by a constructivistthan by a classical mathematician! The problem with trying to amalgamate thetwo frameworks in this way is that once one gets more deeply into the twoapproaches, one finds them proceeding on divergent paths, particularly insubjects using set theory heavily, such as topology. My guess is that one wouldeventually have to use distinguishing fonts for so many concepts that theproposal would be counterproductive. Even if this were not the case, may I makean analogy with English and French? One could regard them as a single languagein which words that we now regard as translations of each other are consideredinstead to be synonyms, possibly with different shades of meaning. Would thisactually help anything?

D. B. A. EPSTEIN. Moe Hirsch suggests a half-life for the truth of theorems orrather one’s degree of belief. If it has not been reproved or re-used, then one’sdegree of belief decreases and eventually vanishes.

E. B. Davies2458

Phil. Trans. R. Soc. A (2005)

E. B. DAVIES. This is a different issue, but nevertheless an important one.Mathematicians value theorems more highly if they have connections withother results than if they are totally isolated. In spite of their commitment toproofs being true or false as they stand, mathematicians appear to liketheories that have highly redundant logical structures. It seems that they donot wholly trust their ability to avoid error when following a long logicalargument—and there is quite a lot of historical evidence that they should nottrust proofs that have not been confirmed by some independent evidence.Such a statement would not be regarded as controversial in any experimentalsubject, but many mathematicians do not like to admit that it also applies totheir own subject.

J. M. NEEDHAM (Department of Computer Science, University of Bath, UK). Howdo you propose to run the classical world and the constructive world side by side?

E. B. DAVIES. I do not personally think this is a fundamental problem. When Iplay chess I manage to remember that its rules are different from those ofcheckers, and when I study vector spaces I remember which theorems only workin finite dimensions and which generalize to infinite dimensions. I manage toremember that the standard form of remainder in Taylor’s theorem does notwork for complex-valued functions. If one wants to remember another distinctionthen one can. Of course it is easier if one does this from the beginning rather thanwhen one is older. A valuable guideline is that if one has a classical proof thatprovides a stable procedure for computing the answer, then it almost surely has aconstructive analogue.

I would stress that the classical and constructive worlds are indeed different—oneis not discussing ‘the real numbers’ and arguing about whether some proof aboutthem is acceptable. One is studying two different entities that have similar butdifferent properties. One might compare the integers as a classical mathemati-cian thinks about them and the integers as treated in a typical computerprogram. In both cases addition leads to a definite result, but for very largeintegers the computer’s output might be an infinity or error symbol. We considerthat one system is right and the other is wrong, but that is because we considerthat the computer is trying to implement our ideas and not fully succeeding indoing so. In the case of classical versus constructive mathematics no suchjudgement is possible.

E. B. DAVIES (additional comment). Those working in and promotingconstructive mathematics are well accustomed to hearing comments to theeffect that there is no evidence that it has any contributions to make to ‘realmathematics’. A few remarks are worth making about this. In some fields, suchas algebraic geometry, this may well be true, but that does not mean that it isbound to be equally true in others. Nobody would (or should) claim thatconstructive analysis leads to the routine solution of difficult problems. Highlyoriginal ideas often enable one to solve problems that were previouslyintractable, and this will remain the case whether one uses classical orconstructive methods. One of my goals was to persuade people to move beyondthe commonplace view that classical mathematics is somehow ‘right’ and otherapproaches thereby ‘wrong’.

2459Pluralism in mathematics

Phil. Trans. R. Soc. A (2005)

The areas in which constructive mathematics does provide valuable insights arethose close to numerical analysis and other fields in which the existence ofexplicit estimates is of the essence. In these subjects a number of ordinaryworking mathematicians have found that an awareness of constructivemathematics helps them to understand the nature of the problems that theyare facing better.

E. B. Davies2460

Phil. Trans. R. Soc. A (2005)

Abstracts of additional presentations made atthe Royal Society Discussion Meeting‘The nature of mathematical proof ’

Social processes and mathematical proof in mathematics & computing:a quarter-century perspective

By Richard LiptonGeorgia Institute of Technology, Atlanta, GA, USA

Twenty-five years ago we (DeMillo, Lipton, Perlis) wrote a paper on howmathematics is a ‘social process’. In particular, real proofs are tested and checkedby a complex social process. One of the consequences of our position is that it isunlikely that real computer systems can or will ever be proved correct. The coreof the argument is a careful examination of the difference between formal proofsand real proofs. In this talk I will present the main argument that we made.Actually, the changes in modern computer technology make it even moreapplicable today than twenty-five years ago.

Machine computation and proofBy Robert D. MacPherson

School of Mathematics, Institute for Advanced Study, Princeton, NJ, USA

In 1609, Kepler made a beautiful conjecture about spheres in space. It was one ofthe oldest unsolved problems in mathematics. In 1998, Tom Hales produced abrilliant computer-assisted proof of the Kepler conjecture. By now, thetheoretical part of Hales’ proof has been refereed as usual mathematical papersare, but the parts involving the computer have resisted all efforts at checking byhumans. Should we think of the Kepler conjecture proved? This talk will examinevarious aspects of this story and the questions it raises, from the point of view ofa practicing mathematician.

Phil. Trans. R. Soc. A (2005) 363, 2461

doi:10.1098/rsta.2005.1659

Published online 12 September 2005

One contribution of 13 to a Discussion Meeting Issue ‘The nature of mathematical proof’.

2461 q 2005 The Royal Society

RSTA_363_1835.qxp 9/20/05 6:53 PM Page 3