frank m.p. - nanocomputers. theoretical models(2004)(52).pdf

Upload: oscura

Post on 24-Feb-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    1/52

    www.aspbs.com/enn

    Encyclopedia ofNanoscience andNanotechnology

    Nanocomputers: Theoretical Models

    Michael P. FrankUniversity of Florida, Gainesville, Florida, USA

    CONTENTS

    1. Introduction

    2. Fundamental Physics of Computing

    3. Traditional Models of Computation

    4. New Models of Nanocomputers5. Generic Realistic Model of Nanocomputers

    6. Specific Nanocomputing Technology Proposals

    7. Conclusion

    Glossary

    References

    1. INTRODUCTION

    In this chapter, we survey a variety of aspects of theoreticalmodels of computation, with an emphasis on those modelingissues that are particularly important for the engineering of

    efficient nanoscale computers.Most traditional models of computing (such as thosetreated in Savages textbook [1]) ignore a number of impor-tant fundamental physical effects that can dramaticallyimpact computational performance at the nanoscale, suchas the basic thermodynamic limits on information process-ing [2], and the possibility of utilizing quantum physicalsuperpositions (essentially, weighted combinations) of logicstates [3]. New, alternative models of computing that allow

    reversible computing[4], while respecting the laws of thermo-dynamics may (gradually, over the next 50 years) achieve alevel of performance and cost efficiency on all types of com-putational tasks that is literally thousands of times greaterthan the best that is physically possible using conventional

    irreversiblemodels [5]. Also, those models that are not onlyreversible but also allow coherentquantum computing, basedon self-interference of entangled superpositions of states,furthermore permit expressing algorithms (for at least somespecial-purpose problems) that require exponentially fewersteps in these models than the best known algorithms in theolder models that do not [3].

    Because of such discrepancies, the scope and precision ofour models of computationmustbe revised and extended in

    order to take these new considerations into account, if wewish our models to continue to be an accurate and powerfulguide to the engineering design of computer architecturesand to the performance of algorithms, even as technologyapproaches the nanoscale. We describe some ways in whichthis has already been done, by the author and others, showsome example results of this modeling effort (such as thequantitative performance advantages of reversible and quan-tum computing quoted previously), describe a variety ofproposed nanocomputing technologies, and identify whichtechnologies have the potential to implement these mostpowerful models. We conclude with a discussion of future

    work that is needed to further develop and flesh out thesemodels to the point where future nanocomputer engineersand programmers will find them maximally useful.

    1.1. Definition of Nanocomputer

    For purposes of this chapter, a nanocomputer is simply

    any computer whose characteristic length scalethe averagespacing between the centers of neighboring primitive func-tional components, such as transistors or other switchingelementsfalls within the three-orders-of-magnitude-widerange that is centered on 1 nanometer (that is, 0.032 to32 nanometers). (Anything in the next larger range mightbe better called a microcomputerand anything in the nextsmaller range, if that were possible, a picocomputer.)

    Under this definition, note that even traditional semi-conductor-based computers are expected to qualify asnanocomputers in only about 13 more yearsto wit, thesemiconductor industrys goals [6] specify that the pitchbetween neighboring wires should fall only slightly abovethis range, specifically at a level of 44 nanometers, by 2016.

    Furthermore, the semiconductor industrys stated milestonessuch as this have historically proven conservative, andindeed, Intel [7], IBM [8], and AMD [9] have alreadydemonstrated10-nm gate-length field-effect transistors inthe lab which, if aggressively packed together, might allowa pitch below the 30-nm nanoscale mark within 10 years,

    which historically is the approximate lag time between lab-oratory demonstrations of transistors and their availabilityin commercial processors. (Of course, if some alternative,

    ISBN: 1-58883-062-4/$35.00Copyright 2004 by American Scientific PublishersAll rights of reproduction in any form reserved.

    Encyclopedia of Nanoscience and NanotechnologyEdited by H. S. Nalwa

    Volume 6: Pages (249300)

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    2/52

    250 Nanocomputers: Theoretical Models

    non-transistor-based nanotechnology development proceedsespecially rapidly, this scale might be reached even sooner.)

    Note that by focusing on the pitch rather than diameterof the primitive elements, we insist that computers basedon narrow-diameter components, such as the carbon nano-tube [10] or semiconductor nanowire [11] logic gates thathave already been demonstrated, would not count as viable

    nanocomputers unless the average spacing, as well as thesize, of these devices across a large and economically manu-facturable array is made sufficiently small, which has not yetbeen accomplished, but which may be in the future.

    1.2. Theoretical Models of ComputingKeyModel Components

    Now, what do we mean by a theoretical model of comput-ing? In general, a theoretical model of any size computer(whether nano or not) can involve a number of differentaspects that are relevant to computer engineering, any of

    which may be more or less abstract (or even left completelyunspecified) in any particular model. These modeling areas

    include:1. A device model specifies the physical and/or

    information-processing characteristics of the individ-ual, lowest level information-processing functionalelements (devices, which we will sometimes call bit

    devices when they are based on binary informationencodings, to distinguish them from larger machines)

    within the computer.2. Atechnology scaling modelspecifies how device char-

    acteristics change as the physical dimensions of thedevices are scaled to smaller sizes, when this ispossible.

    3. An interconnection model specifies how information

    is communicated between devices. When wires areconsidered to be one of these types of devices, theinterconnect model can be considered part of thedevice model. But wires and other types of perma-nent physical structures are not the only possible wayfor devices to communicate; various types of intercon-nects involving physical entities moving through freespace are also conceivable. The precise nature of theinterconnection model has greater implications thanone might at first expect.

    4. Atiming modelspecifies how the activities of differentdevices are to be synchronized with each other. Thetiming model also has more of an impact than mightat first be realized.

    5. A processing architecture model, or just architecture,specifies how devices are functionally connected toform a larger unit called aprocessor, which is complexenough to be programmed to carry out any desiredtype of computation, or at least, to carry out a specifictype of computation on different input information.

    6. A (capacity) scaling model is a more general sort ofarchitecture (sometimes called an architecture family[12]) that allows the capacity of the processor (in bitsof storage, and/or ops-per-cycle of performance) tobe scaled up, via some specified regular transforma-tion, to ever-larger sizes. This stands in contrast to

    nonscalable architectures where the processor is spec-ified to have a fixed, constant number of bits of state,and ops-per-cycle of performance. The most com-mon type of scaling model is a multiprocessorscalingmodel, which defines larger processors as simply beingassemblages of smaller processors that are intercon-nected together, using some processor-level inter-

    connect model, which might be different from theinterconnect model that is used at the device level.7. Anenergy transfer modelspecifies how clean power

    is to be supplied, and dirty power (waste heat)removed, from all parts of a scaled processor. Theenergy system can also be viewed from an informa-tional perspective as supplying known informationin a standard state (in the stable power signal) andremoving unknown information (entropy, in the wasteheat). As such, it is subject to the fundamental limitson information processing to be discussed.

    8. A programming model specifies how the computercan be configured to carry out different types ofcomputations (as opposed to just performing the

    same computation on different input information).A programming model that happens to be basedon traditional types of computer machine-languageinstructions is called an instruction set architecture,but other, radically different types of programmingmodels also exist, such as those that are used todayto program field-programmable gate arrays (whichare general-purpose reconfigurable logic circuits) anddataflow-style machines, as well as earlier, moreabstract models such as Turing machines and cel-lular automata. A high-level programming languageis a very abstract sort of programming model thatis relatively far removed from the architecture ofany specific machine and that is usually translated

    into a more architecture-specific programming modelsuch as machine language, before execution by thehardware.

    9. Anerror handling modelsets forth a scheme for deal-ing with hardware errors, whether they be defects(persistent errors due to malformation or degradationof device structures during manufacturing or opera-tion) orfaults(dynamically arising temporary errors,due to, example, thermal noise, cosmic rays, energyleakage, or quantum tunneling). Techniques suchas error correction codes, J. Baylis, Error-CorrectingCodes, Chapmann-Hall, London, 1998, and defect-tolerant architectures, J. Heath, P. Kuekes, G. Snider,S. Williams, A defect-Tolerant Computer Architec-

    ture: Opportunities for Nanotechnology,Science280,1716 (1998), can dynamically detect such errors andcorrect them (in the case of faults) or work aroundthem (in the case of defects). Note that each newerror occurrence generates some entropy which musteventually be removed from the machine by theenergy transfer system, if it is not to accumulate to thepoint of total degradation of the machines structure.

    10. Aperformance modelcan be used to determine quan-titatively (to some degree of accuracy) how quicklyany specific algorithm implemented in the program-ming model will execute on the specific architecture.

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    3/52

    Nanocomputers: Theoretical Models 251

    Performance can be considered a special case ofcostefficiency, in which cost is considered to be directlyproportional to time (which is appropriate in manycircumstances).

    11. Acost modelquantifies thecost, according to one ormore measures, of manufacturing a computer of givencapacity, and/or of executing a specified computation

    on it. Note that a performance model can actuallybe considered to be a special case of a cost modelin which there is a single cost measure, namely exe-cution time; performance (or quickness) is just thereciprocal of this. As we will discuss, it is also usefulto consider other physically reasonable cost measuressuch as energy costs, spacetime-proportional costs,and total dollar cost of both energy and spacetime.Whatever the cost measure,cost efficiency(or justeffi-ciency) in general is the ratio between the minimumpossible cost to perform a computation and the actualcost. Even if the minimum possible cost of a com-putation is unknown, we know that to maximize thecost efficiency of a given task, we must minimize itsactual cost.

    1.3. Desiderata for Models of Computing

    What do we want our models of computing to be like? Well,here are some properties that we might desire a computermodel to have:

    Ease of programming: The programming model (whenspecified) should be intuitive and easy for programmersto understand and work with.

    Ease of implementation: It should be possible andstraightforward to design and build actual physical

    implementations of the given architecture. Physical realism: The predictions of the cost/

    performance model should be, at least approximately,physically realizable in feasible implementations. Thisfeature, though it is very important for real-world engi-neering purposes, is unfortunately neglected by manyof the theoretical models that have been studied insome of the more pure-mathematics-oriented branchesof computer science (more on this issue later).

    Efficiency: The cost efficiency achievable by programsrunning on top of direct physical implementations ofthe model should be as high as possible, ideally closeto 100% (best possible), but in practice at least lower

    bounded by some constant minimum level of efficiencythat holds independently of parameters of the applica-tion (more on this later).

    Technology independence:If possible, the model shouldbe applicable to a wide range of different possibletechnologies for its physical implementation, ratherthan being tied to a very specific technology. Later

    we will give an example of a technology-independentmodel. However, technology-specific models do alsoplay an important role, for example, for accuratelyassessing the efficiency characteristics of that particu-lar technology.

    1.4. Physical Realism

    A theoretical model of computation (at whatever level ofabstraction) that includes at least a programming model,an architecture, and a performance or cost model will becalledphysically realistic(abbreviatedPR) if it does not sig-nificantly overstate the performance or (understate the cost)for executing any algorithm on top of physically possibleimplementations of that architecture. Physical realism is also(somewhat cryptically) termed congruence in some of theparallel computing literature (cf. [83]).

    As we will survey, not all of the theoretical models ofcomputation that have traditionally been studied by com-puter scientists are actually physically realistic, according toour best-available present-day understanding of physical law;some even overstate performance (or more generally, costefficiency) by multiplicative factors that become unbound-edly large as one increases the capacity of the machine.These factors can be anywhere from polynomially large inthe machine capacity [e.g.,for irreversible three-dimensional(3D) mesh models, which ignore the laws of thermodynam-ics] to exponentially large or even larger (e.g., seeminglyso for nondeterministic models, and also for unrealisticallyprofligate interconnect models that ignore the speed-of-lightlimit). This lack of realism may be acceptable from the per-spective of studying the pure mathematical structure of var-ious models in the abstract, but, as engineers, we prefer forour models to correspond well to reality. So we must becareful in our modeling not to overstate what physics cando, or we may mislead ourselves into wasting time designing,building, and programming machines that cannot possiblylive up to the unrealistic performance expectations that wemay have for them if we are fooled and believe some of thetraditional models.

    1.5. Scalability of Efficiency

    Similarly, computer modeling will also not well serve us ifit significantly understates what physics can do, that is, ifthe architecture and programming model do not allow oneto express algorithms that are as cost efficient as is physi-cally possible. Of course, no specific mechanism (other thanraw physics itself) can be expected to be exactly maximallycost efficient forall computations, but we will argue that itis possible to devise physically realistic models that under-state the best physically possible cost efficiency of compu-tations by only, at most, a reasonably small (that is, notastronomically large) constant factor, and one that further-more does not increase as the machine is scaled to larger

    capacities. We call such modelsuniversally maximally scalable(UMS). Models possessing this UMS property (in addition tophysical realism) would be the ideal architectural templatesfor the detailed design and engineering of future general-purpose nanocomputers, since they would pose no barrier toan application algorithm designers choosing and program-ming the most cost-efficient, physically possible algorithmfor any given computational task.

    As we will survey, out of the various physically realistictraditional models of computation that have been proposedto date, absolutelynoneso far have qualified as UMS mod-els. We describe some new candidates for UMS models that

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    4/52

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    5/52

    Nanocomputers: Theoretical Models 253

    Table 2. Summary of ways in which physics offers opportunities for more cost-efficient computing than would have been thought possible usingearlier, physically realistic (but overly conservative) models of computation that ignored those opportunities.

    Physical observations Opportunities for computing

    1. Events can occur in different places at the same time. Parallel computation; a machine can perform different operations indifferent devices simultaneously [23].

    2. Our universe has three (and only three) usable spatial dimensions. The number of bit locations accessible within a given time delay can

    scale up as quickly as (at most) the third power of the delay; thisfact can help performance of some parallel algorithms, comparedto the 2D or 1D cases [17, 23].

    3. Some types of physical transformations can occur in a way thatgenerates an amount of entropy approaching zero.

    Such nearlyreversibletransformations can perform computations withless loss of free energy, and as a result less total cost in many sit-uations, compared to irreversible methods. This is calledreversiblecomputing[4].

    4. Quantum mechanics allows a system to be in a superposition(weighted sum) of many distinct states simultaneously.

    Carefully controlled systems that use superposition states can takeshortcuts to arrive at solutions to some problems in many fewersteps than is known to be possible using other methods. This iscalledquantum computing[3].

    Note: Parallelism is already heavily used at many levels, from logic circuits to wide-area distributed computing, as are architectural configurations that take someadvantage of all three dimensions of space, though to a limited extent (constraint 6 in Table 1 is an important limit on three-dimensional computing today). Reversiblecomputing and quantum computing, in contrast, are still very much in the research stage today, but they are both expected to become increasingly important forcompetitive computing as device sizes approach the nanoscale. Reversible computing is important because it directly alleviates constraints 5 and 6 from Table 1 (whichare already relevant, in a scaled-up way, today), and quantum computing offers a totally new class of algorithms that will be important in and of themselves for certain

    problems, regardless of whether quantum computing turns out to be useful for general-purpose algorithms, or whether general-purpose nanocomputing itself evenbecomes feasible.

    [25]. Fundamentally, this is because not all pairs of quantumstates are totally mutually exclusive with each other. Rather,states canoverlap, in a certain mathematically well-defined

    way, which results in the phenomenon that a system thatis prepared in a certain state has a necessarily diffuse sortofpresence, so to speak, a presence that extends to partlyinclude other, sufficiently nearby states as well.

    2.1.2. State Space

    The state space of a system is the set of all of its possible

    states. In quantum theory, a state space has the mathemat-ical structure of a Hilbert space (a complex vector spacehaving an inner product operation).

    2.1.3. Dimensionality

    The dimensionality of a systems state space is simply thenumber of states in a maximum-sized set of states that are allmutually exclusive (mutually orthogonal vectors). For exam-ple, the spin orientation of a single spin one-half subatomicparticle has two distinguishable states and thus has a statespace of dimensionality 2. Two such particles together havefour distinct states, and a four-dimensional state space.

    2.1.4. Amount of InformationThe totalamount of informationcontained in a system is justthe logarithm of the dimensionality of its state space, thatis, the logarithm of its maximum number of mutually distin-guishable states [26]. The base of the logarithm is arbitraryand yields a corresponding unit of information. Taking thelog base 2 measures the information in units of bits (binarydigits). Using the natural logarithm (base e 2.717, ), thecorresponding information unit is called the nat. The phys-ical quantities that are traditionally known as Boltzmannsconstant kB and the ideal gas constant R are simply dif-ferent names for 1 nat of information, but they are usually

    expressed in different (though still compatible) units, suchas Joules per Kelvin, or kilocalories per mole per Kelvin,and are usually also reserved specifically for discussing infor-mation that happens to be entropy.

    Note that the total amount of information contained inany system is constant over time, so long as its maximumnumber of states is also. This is the case for any system withconstant total energy and volume.

    2.1.5. Information

    The specificinformation that is in a system (as opposed totheamountof information) is the particular choiceof state,itself. We can say that the actual state of a system is theinformation in the system.

    2.1.6. Entropy

    EntropySwas originally just an unnamed, abstract quantity(the ratio between heat and temperature) of unknown physi-cal significance when its usefulness in thermodynamic calcu-lations was first recognized by Rudolph Clausius in 1850. Butentropy is now understood to simply represent that portionof the information in a system that is not redundant (corre-

    lated) with the information in other parts; that is, it cannotbe derived from the other information. As such, the distinc-tion between which pieces of physical information are effec-tively entropy, and which are not, depends, to some extent,on the information-processing capabilities of the entity thatmight be doing the deriving. A specific body of informationmay appear at first to be haphazard and random, but withsufficient processing, we may eventually notice an underlyingorder to it.

    Right now, the amount of information that is underexplicit control within our computers is just a tiny fractionof the total physical information in the world around us,

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    6/52

    254 Nanocomputers: Theoretical Models

    and so we do not notice the effect that information pro-cessing capabilities can have on entropy. But, as computa-tion approaches the nanoscale, an increasingly large fractionof the information inherent in the physical material mak-ing up our computer circuits will be explicitly manipulatedfor computational purposes, and as a result, the ultimatecomputational nature of entropy will start to become more

    and more apparent. As we will see, it turns out that theamount of entropy that a nanocomputer produces actuallydepends heavily on whether its design recognizes that all ofthe information that it deterministically computes is actu-allynotentropy, since it was derived from other informationin the machine and therefore is redundant with it. Currentmachine designs ignore this fact and simply discard inter-mediate results after they are no longer needed, irreversiblycommitting them to the great entropy dump in the sky. (Lit-erally: the discarded information flows out of the machineand eventually out into space.)

    So, to sum up, entropy is defined as simply any and allinformation whose identity (as opposed to amount) happensto unknown by a given entity of interest, an entity whose

    interactions with the system we are concerned with describ-ing. (This entity in question can itself be any kind of system,from a human to a logic gate.) The state ofknowing canitself be defined in terms of the presence of accessible cor-relations between the state of the knower and the state ofthe system in question, but we will not get into that here.

    2.1.7. Subsystems

    Consider a maximal set of distinguishable states of a system.If this set is partitioned into Nequal-sized subsets, then theselection of one subset from the partition can be consid-ered apartof the state of the whole system. It correspondsto asubsystemof the original system. The amount of infor-mation in the subsystem is log N. This much of the whole

    systems information can be considered to be located inthesubsystem. Two subsystems are independentif they partitionthe state space along independent (orthogonal) directions,so to speak. (This concept can be made more precise but

    we will not do so here.) A set of mutually independent sub-systems iscompleteif specifying the state of each subsystemis enough to specify the state of the whole system exactly.

    A minimal-sized subsystem (one that cannot be further bro-ken down into independent subsystems) is sometimes alsocalled adegree of freedom.

    2.1.8. Bit Systems

    Abit system or just bit is any degree of freedom that con-

    tains only 1 bit of information, that is, a bit is a partitionof the state set into two equal sized parts. Note the dualusage of the wordbit to refer to both a unit for an amountof information and to a system containing an amount ofinformation that is equal to that unit. These uses should notbe confused. Systems of sizes other than 1 bit can also bedefined, for example bytes, words, etc.

    2.1.9. Transformations

    Atransformation is an operation on the state space, map-ping each state to the corresponding state resulting fromthe transformation. It is a fundamental fact of quantum

    mechanics (and all Hamiltonian mechanical theories, moregenerally) that the transformations corresponding to thepassage of time are reversible (that is, one-to-one, invert-ible, bijective). The size of a given transformation can bedescribed in terms of the average distance between oldstates and new states, by some appropriate metric.

    2.1.10. Operations

    A primitive orthogonalizing operation (or just operation forshort) is a transformation that maps at least one state tosome new state that is distinguishable from the original state,and that cannot be composed of smaller operations. Anoperation ison a particular subsystem if it does not changethe state of any independent subsystem. An operation on abit system is called a bit operation (and similarly for othersizes of systems). Two operations commute if performingthem in either order has the same net effect. Operations onindependent systems always commute.

    2.1.11. Transformation Trajectory

    A transformation trajectory is a transformation expressedas a sequence of (primitive orthogonalizing) operations, orpieces of such operations, operating on individual degrees offreedom (e.g.,a quantum logic network Nielsen and Chuang[3]).

    2.1.12. Number of Operations

    The totalnumber of operationsthat take place along a giventransformation trajectory can be defined.Plancks constanth(or

    def= h/2) can be viewed as a unit for expressing a num-

    ber of operations. The unreduced Plancks constanthrep-resents two primitive operations (for example, a complete

    rotation of a particle spin through an angle of 360 ), whilethe reduced constant represents a fraction 1/of a prim-itive operation, for example, a rotation of a spin through anangle of only 1 radian.

    2.1.13. Steps

    Acomplete parallel update stepor juststepis a transforma-tion of a system that can be described by composing oper-ations on each subsystem in some maximal, complete set ofsubsystems, such that the total number of operations in bit-ops is equal to the amount of information in bits. In other

    words, it is a complete overhaul ofall of the state informa-tion in the system, whereas anoperationon the system only

    potentially changes somepartof the state.

    2.1.14. Dynamics

    Thedynamicsof a system specifies a transformation trajec-tory that is followed as the system evolves in time.

    2.1.15. Amount of Time

    Given the dynamics, the amount of time itself can be definedin terms of the number of steps taken by some fixed ref-erence subsystem, during a given trajectory taken by the

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    7/52

    Nanocomputers: Theoretical Models 255

    system. Note that if the system and the reference subsys-tem are both taken to be just the whole universe, then time

    just represents the total amount of change in the universe,in terms of number of parallel update steps performed.(Such concepts hearken back to the relativist philosophiesof Leibniz and Mach which helped inspire Einsteins generalrelativity [27].)

    2.1.16. Energy

    Now, theenergyin a subsystemisthe rate at which primitiveoperations are taking place in that subsystem, according toits dynamics. In other words, energy isactivity; it iscomputingitself. This can be proven from basic quantum theory [16,18, 28].

    As a simple way to see this, consider any quantum systemwith any subsystem whose physical Hamiltonian induces anytwo energy eigenstates of distinct energies; call these states0 and 2E arbitrarily. Now, if the subsystem happens to bein the state 0 + 2E, which has (expected) energy E, thenthe quantum time evolution given by the systems Hamil-tonian takes it to the orthogonal state 0 2E in time

    (h/4)/E. Margolus and Levitin [18] show that a system withenergyEcan never change to an orthogonal state any fasterthan this, no matter what its initial state. Therefore, we cansay that any E-sized chunk of energy is, every h/4E time,performing the operation If I am in this subsystem and itsstate is 0 + 2E, make its state 0 2E, otherwise .This transformation counts as an operation, by our defini-tion, because it does orthogonalize some states. However,this particular operation is somewhat limited in its power,because the subsystem in question subsequently immediatelycycles right back to its original state. We call this specialcase aninverting op(iop); its magnitude in terms of Plancksconstant is (h/4). Margolus and Levitin show that an opthat instead takes a system to the next state in a repeating

    cycle ofN states requires more iops worth of time, in fact,2(N1)/Ntimes more ops, or [(N1)/2N h.

    In the limit as the cycle length Napproaches infinity (as itdoes in any complex system), the time per orthogonal tran-sition approaches 2 iops worth, or (h/2), so we define this asthe magnitude of a generic op as previously.

    Incidentally, when applying the MargolusLevitin relationto the example of a simple freely rotating system, N canbe argued to be equal to the systems total angular momen-tum quantum number lplus 1, and with a few more steps,it turns out that the relation can be used to independentlyconfirm the usual angular momentum quantization formula(L/2 = ll +1, much more easily than by the usual deriva-tion found in quantum physics textbooks.

    2.1.17. Heat

    Heat is just the energy in those subsystems whose state infor-mation is entirely unknown (entropy).

    2.1.18. Temperature

    Thetemperatureof a subsystem is the average rate at whichcomplete update steps are taking place in that subsystem(i.e., the average rate of operations per bit) [28]. Note thatenergy divided by temperature gives the amount of infor-mation in the subsystem. This is, historically, how physical

    information (in the form of entropy) was first noticed as animportant thermodynamic quantity (by Rudolph Clausius, in1850), even before the fact that it was really just information

    was understood.Note that the reciprocal of temperature is just the time

    required for a complete step that, on average, updates allparts of the state information, once each.

    This definition of temperature, in contrast to traditionalones, is general enough that it applies not just to systemsin a maximum-entropy equilibrium state (all of whose infor-mation is entropy), but more generally to any system, evensystems in a completely known state with no entropy, whichaccording to traditional definitions of temperature wouldalways be at absolute zero. However, for any system we canalso identify athermal temperaturewhich is the average tem-perature of any of its subsystems whose state informationis entropy, and then consistently define that the thermaltemperature of a system having no entropy is zero. Ther-mal temperature is, then, just the traditional thermodynamicconcept of temperature. But our more general temperatureconcept is somewhat more flexible.

    Note also that energy spontaneously flows from hotsubsystems to cold ones, rather than vice versa, simplybecause the fast-changing pieces of energy in the hot systemmore frequently traverse the trajectories through state spacethat cross over the boundary between the two systems.

    These are enough definitions to support our later discus-sions in this chapter. A more complete discussion of the rela-tionships between physics and information processing can befound in [24] (in progress).

    Discussions of quantum information and how it extendsthe classical concepts of information can be found in [3].

    3. TRADITIONAL MODELSOF COMPUTATION

    In this section, we systematically survey a wide variety ofearly models of computing and identify exactly which of thelimits or opportunities listed in the previous section each onefails to account for, which result in the models lacking oneor the other of the properties of PR or UMS. Furthermore,

    we consider what types of costs are respected in the model.In the next section, we will present the newer models

    which may actually come close to being both PR and UMSfor all practical purposes in the foreseeable future.

    Here are some contributions to real-world costs thatan economically thorough cost model ought to take intoaccount:

    1. Manufacturing cost to build a machine that can per-form a given computation. We may expect this to beroughly proportional to its total information capacity.However, if the machine can be reused for more thanone computation, then the cost model should accountfor this properly (cf. item 3a below).

    2. Costs that may be considered to scale roughly propor-tionally to the execution time of programs, but not tothe machines manufacturing cost, such as, for exam-ple, the inconvenience cost to the user of waiting toreceive a desired result.

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    8/52

    256 Nanocomputers: Theoretical Models

    3. Costs that can be expected to scale proportionally toboth execution timeandmanufacturing cost, such as:

    a. Hardware rental cost, or essentially manufactur-ing cost amortized per unit time, given some fixedexpected lifetime for the machine.

    b. Maintenance and operation costs for the machineper unit time, including cost of energy used bycomponents that are operating at constant powerlevels.

    c. Opportunity cost foregone by not applying themachines capacity toward some alternative usefulpurpose.

    4. Total cost of energy spent for the computation. We listthis separately from item 3b because later we will seethat there are significant components of energy thatarenotnecessarily proportional to spacetime usage.

    Traditional computational complexity theory (cf. [29])considers purely time-proportional costs like item 2, simpli-fied to just the total number of discrete time-steps (clockticks) performed (i.e., assuming a fixed rate of steps perunit time), and dubbed time complexity. It also considers arough measure of manufacturing cost, in the form of thetotal number of bits of storage required to perform thecomputation and calls this space complexity. However, most

    work in complexity theory does not combine the two in thenatural way suggested by items 2b2e, which is that realcosts are usually proportional tobothspace and time, or inother words to thespacetimeutilized, that is to say, the costtorentthe required amount of hardware for the amount oftime needed to perform the computation, and to other coststhat can be assumed to scale proportionally to this quan-tity, such as maintenance cost, opportunity cost, and energyused (typically).

    Some cost models in complexity theory count the numberof fixed-size computational operations that are performed,rather than the number of parallel steps. This comes closer

    Table 3. We can compare various models of computation as to which fundamental physical limits they violate (see Table 1), which opportunitiesfor more efficient computing they leverage (Table 2), and which aspects of cost they take into account.

    Opportunities CostsFundamental limits violated leveraged considered

    Model 1 2 3 4 5 6 7 8 9 10 1 2 3 4 1 2 3 4

    Turing machine [31] 1 1/2RAM machine [32] (b) 1/2PRAMs,etc. (b) 1/21D cellular automata [33] 1 2D cellular automata [33] 2 3D cellular automata [33] 3 Reversible logic networks [34] (b) Quantum logic networks [35] (b) Reversible 3D mesh [4] 23 Quantum R3M [4] 23

    Note:Opportunity #2 gives the number of dimensions explicitly or implicitly assumed by the model; three or more is unrealistic, two or less is underambitious. Costmeasure #3 (spacetime) is denoted half-way considered if spacetime costcouldbe easily measured in the model but is typically ignored instead. Note that the quantumreversible 3D mesh model described in Section 6 (first introduced in [4]) strictly dominates all earlier models in realism and comprehensiveness, so long as gravity (limit#10) is not a concern, which we can expect to remain true for very long time. (This limit would only become relevant if/when we come to build computer systems ofnear black-hole density,e.g.,by building them out of several suns worth of matter, or alternatively by somehow achieving an average device-pitch scale nearly as smallas the Planck length scale of fundamental particles. Both of these possibilities seem extremelydistant at present, to say the least.)

    to spacetime cost but still does not quite hit the mark, sincethere are real costs even associated with those bits that are

    just sitting statically and not being operated on at all (hard-ware rental cost, maintenance cost, opportunity cost).

    Newer models such as VLSI theory (very large scale inte-grated circuits cf. [30]) address these problems somewhatby considering the hardware efficiency of algorithms, which

    is essentially the reciprocal of their spacetime usage. How-ever, these models still do not usually integrate the cost ofenergy into the analysis in a way that treats it as somewhatindependent from spacetime cost, which it is, as we will see.

    Table 3 summarizes how a variety of existing theoreticalmodels of computation fare with respect to the fundamentallimits and opportunities discussed in Section 2, and the costsdiscussed previously. A discussion follows.

    3.1. Turing Machines

    The Turing machine [31] was the first universal, physicallyevocative (as opposed to totally abstract and formal) modelof computation. It has a single fixed-size processor (the

    head) and a memory laid out in one dimension that isaccessed in serial fashion (the tape). The Turing machinemodel does not violate any of the fundamental physical lim-its on information processing, and therefore it is physicallyrealistic.

    However, since a Turing machine has only a single, fixed-size processor (the tape head), it does not leverage thepossibility of parallelism. Multitape and multihead Turingmachines provide limited parallelism, but true parallelism inthe model requires that the number of processors must bescaled up in proportion to the information capacity of themachine. Turing machine models usually do not try to dothis, but later we will see other models that do.

    It is possible to analyze space and time costs in a Turingmachine model, but the joint spacetime cost is not usuallya concern, since the model has such understated efficiencyin any case. Due to its drastic suboptimality, the Turing

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    9/52

    Nanocomputers: Theoretical Models 257

    machine and related models are primarily only suitable forthe following purposes:

    Determining whether a desired class of computationsis possible to do at all, even given unlimited resources(itscomputability).

    Proving that other models of computation are univer-sal, by showing that they are capable of simulating aTuring machine.

    Determining the time required to perform a compu-tation to a very rough degree of accuracy, that is, to

    within a polynomial factor (a factor growing as nkwhere n is the size of the input data set and k is anyconstant). Thestrong Churchs thesis[36] is the hypoth-esis that Turing machines are satisfactory for this pur-pose. However, results in quantum computing suggeststrongly that ordinary nonquantum Turing machinesmay actuallyoverstatethe physically required minimumtime to solve some problems by exponentially largefac-tors (that is, factors growing roughly like en) [3], in

    which case the strong Churchs thesis would be false. Determining the space (measured as number of bits)

    required to perform a computation within a con-stant factor.

    These concerns are generic ones in computing and are nottied to nanocomputing specifically but can be used in thatcontext as well. However, if one wishes a more precise modelof costs to perform a desired computation than can be pro-

    vided by Turing machines, one must turn to other models.

    3.2. RAM Machine

    One limitation of the Turing machine was that since thememory tape was laid out serially in only one dimension,merely traversing it to arrive at a desired item consumed

    a large portion of the machines time. For early electronicmemories, in contrast, the time required for a signal to tra-verse the distance through the machine was negligible incomparison to the time taken to perform logic operations.Therefore, it was useful to model memory access as requir-ing only a single step, regardless of the physical distance tothe desired memory location. This fast memory access modelis the defining characteristic of the RAM or random-accessmachine model of computation [32]. The RAM model isoccasionally called the von Neumann machine model, afterthe inventor of architectures having a central processingunit (CPU) with a separate random-access memory for stor-ing programs and data. The RAM model is also sometimesextended to a parallel model called the PRAM.

    Today, however, individual transistors have become sofast that the speed-of-light travel time across an ordinary-sized machine is becoming a significant limiting factor onthe time required to access memory, especially for com-putations requiring large-scale supercomputers having largenumbers of processors. For example, at the 3 GHz proces-sor clock speeds that are now routinely available in com-mercial off-the-shelf microprocessors, light can only travel10 cm in one clock cycle, so the memory accessible withina round-trip latency of one cycle is limited to, at most, theamount that will fit within a 5-cm radius sphere centered onthe processor. (In practice, at present, the situation is even

    worse than this, because the time to access todays commer-cial memory technologies is much greater than the speed-of-light travel time.) And, when considering a wide-areadistributed computation, communication halfway around theEarth (i.e., 20,000 km) requires at least 200 million clockcycles! Delays like these can be worked around somewhatby using architecturallatency hidingtechniques in processor

    architectures and parallel algorithms, but only to a very lim-ited extent [12, 37]. Furthermore, these problems are onlygoing to get worse as clock speeds continue to increase.Communication time is no longer insignificant, except forthe restricted class of parallel computations that requireonly very infrequent communication between processors, orfor serial computations that require only small amounts ofmemory. For more general purposes, the RAM-type modelis no longer tenable.

    Slightly more realistic than the RAM are models thatexplicitly take communication time into account, to someextent, by describing a network of processing nodes or logicgates that pass information to each other along explicit com-munication links. However, depending on the topology of

    the interconnection network, these models may not be phys-ically realistic either. Binary trees, fat trees, hypercubes, andbutterflyoromeganetworks are all examples of interconnec-tion patterns in which the number of locations accessible

    within n hops grows much faster than n3, and therefore,these networks are impossible to implement with unit-timehops above a certain scale within ordinary 3D space. The

    onlyscalable networks in 3D space are the locally connectedormesh-type networks, and subgraphs of these [12, 17].

    3.3. Cellular Automata

    Cellular automaton (CA) models, also originally due to vonNeumann [33], improve upon theRAM-type or abstract-network model in that they explicitly recognize the con-

    straints imposed by communication delays through ordinaryEuclidean space. CAs are essentially equivalent to mesh-interconnected networks of fixed-capacity processors. Theone-dimensional and two-dimensional CA variations areentirely physically realistic, and the 2D CA can be used asa starting point for developing a more detailed theoreticalor engineering model of todays planar circuits, such as, forexample, the VLSI theory of Leiserson [30].

    However, ordinary CAs break down physically when onetries to extend them to three dimensions, because theentropy that is inevitably produced by irreversible opera-tions within a 3D volume cannot escape quickly enoughthrough the 2D surface. To circumvent this constraint whilestill making some use of the third dimension requires avoid-

    ing entropy production using reversible models, such as wewill discuss in Section 4.1. These models can be shown tohave better cost-efficiency scaling than any physically possi-ble nonreversible models, even when taking the overheadsassociated with reversibility into account [4].

    Finally, all of these models, in theirtraditionalform, missthe opportunity afforded by quantum mechanics of allow-ing machine states that are superpositions (weighted com-binations) of many possible states, within a single piece ofhardware, which apparently opens up drastic shortcuts to thesolution of at least certain specialized types of problems. We

    will discuss quantum models further in Section 4.2.

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    10/52

    258 Nanocomputers: Theoretical Models

    4. NEW MODELS OF NANOCOMPUTERS

    Computer technology already is forced to contend with thelimits to communication delays imposed by the speed-of-light limit. Over the next 2050 years, we can expect the lim-its that thermodynamics and quantum mechanics place onbit energies, sizes, and bit-device speeds to become plainlymanifest as well. Other fundamental constraints, such asthe one that gravity imposes on machine size (namely, thatany sufficiently large 3D computer with a fixed energy den-sity will collapse under its own gravity to form a blackhole) are still very far away (probably many centuries) frombeing relevant.

    So what we want is to have a model of computation thatis physically realistic, at least with respect to the relativelynear-term constraints, and that also provides a cost effi-ciency that scales as well as possible with increasing com-putation size, for all classes of computations. We can arguethat there is an existence proof that such a model must bepossible, for the laws of physicsthemselvescomprise such amodel, when looked at from a computational perspective.However, raw physics does not provide a very convenientprogramming model, so our task is to develop a higher levelprogramming model that scales as well as possible, whilealso providing a relatively easy-to-understand, comprehensi-ble framework for programming.

    In Sections 4.1 and 4.2 we survey a couple of new classesof models which attempt to make progress toward this goal,namely the reversible and quantum models.

    4.1. Reversible Computing

    The fundamental insight of reversible computing is thatthere is absolutely nothing about fundamental physics thatin any way requires that the free energy that goes into bitmanipulations must be discarded after each operation, inthe form of waste heat. This is because bits that have beencomputed are not (yet) entropy, because they are derivedfrom, and thus correlated with, other bits that are present inthe machine. Present-day computers constantly discard tem-porary results that are no longer needed, in order to free upstorage space for newer information. This act of wanton era-sure causes the old information and energy associated withthose bits to be relegated to the degraded status of effec-tively becoming entropy and heat, respectively. Once this isdone, it cannot be undone, since the second law of ther-modynamics tells us that entropy can never be destroyed.Information erasure is irreversible [19].

    However, we canavoidthis act of trashification of bitsby insteadrecyclingbits that are no longer needed, by takingadvantage of their redundancy with other bits present in themachine to restore the unneeded bits to a standard state(say 0 for an empty memory cell), while leaving the bitsassociated energy (or most of it, anyway) in the machine, inthe form of free energy which can go on to perform anotheruseful computational operation [38].

    4.1.1. Adiabatic Principle

    Of course, no machine is perfect, so even in a reversiblemachine, some of the kinetic energy associated with the per-formance of each operation goes astray. Such events are

    calledadiabatic losses. The detailed accounting of adiabaticlosses can be proven from basic quantum theory as in the

    adiabatictheorem [39], which tells us that as a system pro-ceeds along a given trajectory under the influence of slowlychanging externally applied forces, the total energy dissipa-tion is proportional to the speed with which the externalforces change; however, rather than getting into the tech-

    nical mathematical details of this theorem here, we discusssome more intuitive ways to understand it.First, the amount of adiabatic loss is roughly propor-

    tional to the number of elementary quantum operationsperformed, and thus to the energy involved in carrying outtransition times the time over which it is performed, dividedby a technology-dependent constant that specifies the quan-tumquality factorof the system, that is, how many quantumoperations can be performed on average without an error(decoherence event).

    As the speed of carrying out a given transformation isdecreased, the kinetic energy associated with the systemsmotion along the desired trajectory through configurationspace decreases quadratically (in proportion to the square

    of the speed, since as we all know, kinetic energy is 12 mv2),and so the total adiabatic losses over the entire motiondecrease in inverse proportion to the time taken for thetransformation.

    However, when the kinetic energy involved in carryingout transformations decreases to a level that is close to thestatic bit energies themselves, further decreases in speed donot help, because entropy generation from degradation ofthe static bits comes to dominate the total dissipation. Thatis, some of the energy whose job it is to maintain the verystructure of the machine, and/or the state of its stored infor-mation, also leaks out, in a continual slow departure fromthe desired configuration (this is called decay), which mustbe periodically repaired using correction mechanisms if thecomputation is to continue indefinitely. For example, all ofthe following phenomena can be considered as simply dif-ferent examples of decay processes:

    Charge leakage from DRAM (dynamic RAM) cells,requiring periodic refreshing.

    Bit errors due to thermal noise, cosmic ray impacts,etc., requiring the use of error-correction algorithms.

    Decoherence of quantum bits from various unwantedmodes of interaction with a noisy, uncontrolled envi-ronment, requiring quantum error correction.

    Gradual diffusion of the atoms of the devices into eachother (e.g.,from electromigration), leading to eventualfailure requiring remanufacture and replacement of all

    or part of the machine.All of these kinds of decay processes incur a cost in terms

    of free energy (to periodically correct errors, or to repairor replace the machine) that is proportional to the space-time usage, or space to hold bits, times time occupied, ofthe computation. This spacetime usage cannot be adequatelyreduced to time alone, space alone, or even to the numberof logical operations alone, since, depending on the compu-tation to be performed, not all bits may be actively manip-ulated on every time step, and so the spacetime usage maynot be simply proportional to the number of operations.

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    11/52

    Nanocomputers: Theoretical Models 259

    Adiabatic (or kinetic) losses, on the other hand, do effec-tively count the number of operations performed, but theseare quantum operations, whose number is not necessarilydirectly proportional to the number of classical bit opera-tions, even when the algorithm being carried out is a classi-cal one. This is because the number of quantum operationsinvolved in carrying out a given classical bit operation

    increases in proportion to the speed with which the desiredtrajectory through state space is followed.There are two ways to see this. First, the de Broglie wave-

    length of the particle wave packet representing the sys-tems state in configuration space is inversely proportional toits momentum, according to the formula = h/p. Momen-tum is proportional to velocity, so following a given trajec-tory will involve a larger number of distinct transitions ofthe systems wave packet (i.e., translations through about a

    wavelength) the faster it is done; each of these can be con-sidered a orthogonalizing quantum operation.

    Second, recall that kinetic energy increases with thesquare of velocity, whereas the frequency or quickness with

    which a fixed-length classical trajectory is followed increases

    only linearly with velocity. Therefore, the interpretation ofenergy as the rate of quantum operations requires that thenumber of operations on a given trajectory must increase

    with the speed at which that trajectory is followed.With this interpretation, the technology-dependent coef-

    ficients (such as frictional coefficients,etc.) that express theenergy dissipation per unit quickness for an adiabatic pro-cess can be seen as simply giving the decoherence timesfor those qubits whose transitioning corresponds to kineticenergy. The decoherence of qubits carrying energy causesthe dissipation of that energy. The adiabatic principle (whichstates that the total energy dissipation of an adiabatic pro-cess is proportional to its speed) can be derived from thepostulate that a fixed fraction of kinetic energy is dissi-pated each time unit [22]. Adiabatic coefficients are there-fore lower bounded by the decoherence rates that can beachieved for qubits whose transitions carry us from one log-ical machine state to another.

    The adiabatic principle also tells us that whenever logicaltransitions are carried out by a process that uses multiplequantum operations (in place of a single one), we are doingextra unnecessary work, and thus generating more entropy(and energy dissipation) than necessary. This happens when-ever we try to do a process faster than strictly necessary.

    As a simple example, consider a hollow cylinder of radiusrand mass m, rotating with rim velocity v . Let us consider arotation of this wheel to carry out a cycle in our computer,

    a complete transition from one logical state to another.A simple calculation shows that the number of quantumorthogonal transitions (angle /2 rotations of the state vec-tor in Hilbert space) that occur during one complete rota-tion is given by 4L/, whereL = mvris the wheels angularmomentum about its axis, and is Plancks (reduced) con-stant,h/2. Total angular momentum for any system is quan-tized, and the minimum possible rotation speed occurs whenL = . At this speed, the kinetic energy is just enough tocarry out one quantum logic operation (an iop, for example,a bit toggle) per quarter-cycle. At this rate, the rotation of the

    wheel through a quarter-turnis, from a quantum mechanical

    perspective, a bit flip. The decoherence rate of this spin qubitdetermines the rate at which the wheels energy dissipates.

    In contrast, if the wheel were spun faster (Lwere higher),there would be proportionally more distinct rotational posi-tions around 1 complete rotation, and the total energy isquadratically higher, so the average energy per location (orthe generalized temperature) is proportional to L. With

    orderLmore locations, each carrying order Lmore energy,a fixed decoherence rate per location yields a quadraticallyhigher total rate of energy dissipation, and thus a linearlyhigher amount of entropy generation per complete cycle.This is an example of why the dissipation of an adiabaticprocess is proportional to the speed at which it is carried out.

    Simply put, a faster process has quadratically greaterkinetic energy and so, given a fixed mean-free time or deco-herence time for that energy, energy dissipates to heat at aquadratically faster rate, for linearly more energy dissipationduring the time of the operation.

    The minimum energy dissipation of an adiabatic processoccurs when the speed of the transition is slow enough thatthe dissipation of kinetic energy is not much greater than the

    dissipation of static (potential) energy. If the decoherencerates are comparable for the two types of energy, then thekinetic energy for bit change should be of the same order asthe static energy in the bits themselves, as in our previous

    wheel example.This makes sense, since if energy is computing, we want

    as much as possible of our available energy to be activelyengaged in carrying out transitions at all times. Having akinetic energy that is much larger than bit energy wouldmean that there was a lot of extra energy in the system that

    was not directly occupied in carrying out bit transitions. Insuch cases, a more direct and economical design would bepreferable. This is what a good optimized adiabatic designattempts to accomplish.

    4.1.2. Device Implementation Technologies

    Reversible, adiabatic logical mechanisms can be imple-mented in a wide variety of physical systems; indeed, nearlyevery type of bit-device technology that has been proposed(whether electrical, mechanical, or chemical) admits somesort of reversible variant. Virtually all of these technolo-gies can be usefully characterized in terms of the bistablepotential-well paradigm introduced by Landauer [19]. In thisframework, a bit is encoded by a physical system having twodistinct meta-stable states. The relative energies of thesestates, and the height of a potential barrier between them,is adjustable by interactions with nearby bits. Figure 1 illus-

    trates this model.Irreversible erasure of a bit in this model correspondsto lowering a potential-energy barrier (e.g.,by turning on atransistor between two circuit nodes) regardless of the stateof the bit under consideration (say, the voltage on one of thenodes). Due to the energy difference between biased states,this in general leads to large, nonadiabatic losses (thick redarrows in the diagram), which reversible logic must avoid.Even the lowering of a barrier between two states of equalenergy still creates at least 1 bits worth of entropy, even

    when done infinitesimally slowly, if the state of the bit wasnot already entropy (medium red arrows).

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    12/52

    260 Nanocomputers: Theoretical Models

    Possible Adiabatic Transitions

    Direction of Bias Force

    Barrier

    Height

    0 0 0

    111

    10 N

    (Ignoring superposition states.)

    leak

    leak

    1

    states

    0

    states

    leak

    leak

    Figure 1. Possible adiabatic (green) and nonadiabatic (red) transitionsbetween states of any device technology that provides a generic bistablepotential well. Each box indicates a different abstract configuration ofthe system. Within each box, thexaxis is some continuous state variablein the system (such as the position of a mechanical component or acharge packet), and the y axis is potential energy for the given valueof the state variable. Small black horizontal lines show the energy leveloccupied (or the surface of a set of occupied levels). The x position ofthe occupied state encodes the value of the bit. Device configurations

    encoding logical values of 0, 1, and an in-between neutral level Nare shown. Thick arrows between configurations indicate nonadiabaticactive transitions, while thin arrows indicate possible leakage pathways(activated thermally or by tunneling). Note the lower three boxes showthe potential barrier lowered, while the upper three show it raised. Theleftright position of the box in the diagram corresponds roughly to thedirection of an external force (e.g.,from a neighboring device) that isapplied to the system.

    Of course, even in reversible logic systems, we must stillcontend with the smaller losses due to thermally excitedtransitions or tunneling of the bit systems state over thepotential energy barriers (thin red arrows labeled leak)

    Now, a number of different reversible logical and storagemechanisms are possible within this single framework. We

    can categorize these as follows:1. Input-bias, clocked-barrier latching logic (type I): In thismethod, the device barriers are initially lowered, andinput data to the device apply a bias, pulling the devicetoward a specific bit value. Optionally in this stage,forces applied by several input bits can be combinedtogether to carry out majority logic (or switch gates canbe used to do logic, as in [40]). Next, a timing signalraises the barrier between the two states. This step canalso serve to amplify and restore the input signal. Afterthe barrier is raised, the input data can be removed,and the computed stated of the device remains stored.Later, the stored data can be reversibly unlatched, ifdesired, by the reverse sequence of steps.

    Specific physical examples of the type I techniqueinclude the adiabatic quantum dot cellular automatonof Lent and Tougaw [41], the complementary metal-oxide semiconductor (CMOS) transmission-gate latchof Younis and Knight [42], the reversible rod logic latchof Drexler [43], the reversible superconducting Para-metric Quantron logic of Likharev [44], the mechanicalbuckled logic of Merkle [45], and the electronic helicallogic of Merkle and Drexler [40].

    2. Input-barrier, clocked-bias retractile logic(type II): In thistechnique, the input data, rather than applying a biasforce, conditionally raise or lower the potential energy

    barrier. Arbitrary AND/OR logic can be done in thisstage, by using series/parallel combinations of severalbarriers along the path between bit states. Then, atiming signal unconditionally applies the bias force,

    which either pushes the system to a new state, or not(depending on whether the barrier between states wasraised). Since the output state is not inherently latched

    by this timing signal, the input signal cannot then beremoved (if this would lower the barriers) until otherdownstream devices have either latched a copy of theoutput or have finished using it. So these gates cannotby themselves perform sequential logic (e.g.,memory,networks with feedback, or pipelined logic), althoughthey can be used in combination with latching gates todo so.

    Examples of the type II technique are Halls retrac-tile electronic logic [46], the nonlatching portions ofYounis and Knights CMOS SCRL gates [42], andDrexlers rod logic interlocks [47].

    3. Input-barrier, clocked-bias latching logic (type III):Finally, from Figure 1, one can immediately see that

    there is a third possibility, one that has not previouslybeen considered. It is more efficient than either (1)or (2), in the sense that it combines AND/OR logic

    with latching in a single operation. In this scheme, aswith the previous one, the input signal sets the barrierheight, doing logic in series/parallel networks, and thena timing signal applies an unconditional bias force. But

    we note that the input signal can now be immediatelyremoved, if in doing so we restore the barriers to anull state that consists ofbarriers unconditionally raised.Then, when the bias timing signal is removed, the out-put bit remains latched in to its new state. The bitcan be unlatched using the reverse sequence along thesame path, or a different one.

    This general method for doing reversible logicapparently has not been previously described in the lit-erature, but we have developed a straightforward tech-nique for implementing this model in standard CMOStechnology. It is significantly more efficient than pre-

    vious truly adiabatic logic families, by several differentmetrics.

    The bistable potential-well model is basically one in whichwe model one subsystem (the output bit) as being subjectto a time-varying Hamiltonian (essentially, a matrix repre-senting the forces on the system) that is provided by thedevices interaction with some other subsystems (the inputand clock bits). However, one must stay aware that a closer

    look at the situation would reveal that there is really justa singlesystem that evolves according to an actual underly-ingHamiltonian which is time-independent, being itself justan expression of the unchanging laws of physics. So, in gen-eral, one cannot ignore the back-reaction of the controlledsystem (the output bit) on the system that is doing the con-trolling (the input bit), especially if we want to be accurateabout the total energy consumption in the entire system,including the controlling system, which in practice is justsome other logic element within the computer. For exam-ple, it is easy to design adiabatic gates using transistors thatdissipate almost no energy internally, whose transitions are

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    13/52

    Nanocomputers: Theoretical Models 261

    controlled by some large external signal generator. But itis much harder to integrate the signal generator and designa complete, self-timed system that is nevertheless almostreversible. For this reason, some authors have unjustly crit-icized the concept of adiabatic circuits in general, solely onthe basis of the poor quality of the particular signal genera-tors that were assumed to be used in the authors particular

    analysis. But the failure of a single short-sighted designerto imagine an efficient resonator does not mean that sucha thing is impossible, or that alternative designs that avoidthese timing-system inefficiencies are impossible to build.

    For example, Bennett [2] illustrated a proof-of-conceptmechanical model of a self-contained reversible computer bydoing away with directed kinetic energy entirely, and insteadletting the system take a random walk, forward or backward,along its nonbranching path through configuration space.Unfortunately, doing away with kinetic energy entirely is notsuch a good thing, because random walks are very slow;they take expected time that increases quadratically withthe distance traveled. (Chemical computing techniques in

    which communication relies primarily on molecular diffu-

    sion in solution also have performance problems, for thesame reason.) Thus, we would prefer to stick with designsin which the system does have a substantial net kineticenergy and momentum along the forward direction of itsmotion through configuration space, while yet dissipatingkTenergy per logic operation performed, due to a highquality factor for the individual logic interactions. We callsuch trajectoriesballistic.

    I emphasize that we still know of absolutely no funda-mental reasons why ballistic reversible computation cannotbe done, and with arbitrarily high quality factors. A seriesof increasingly realistic models have been proposed thatillustrate steps along the path toward actually doing this.Fredkin and Toffoli [34] described a ballistic billiard ball

    model (BBM) of reversible computation based on collisionsbetween idealized elastic spheres. Their original model con-tained chaotic instabilities, although these can be easily fixedby confining the balls to travel along valleys in configura-tion space, and by keeping them time-synchronized througha few additional mechanical interactions. A concrete exam-ple of an electronic model similar to the BBM that avoidsthese instability problems entirely is described in [40]. Pure-mechanical equivalents of the same synchronization mecha-nism are also straightforward to design.

    Now, the BBM was primarily just a classical model.Richard Feynman and Norm Margolus made some initialtheoretical progress in devising a totally quantum-mechanicalmodel of ballistic reversible computation, Feynman with a

    serial model [48], and Margolus with a self-synchronized par-allel model [49]. However, Margolus model was restrictedto allowing parallelism in only one dimension, that is, withonly a linear chain of active processors at any time. As ofthis writing, we do not yet have an explicit, fully detailed,quantum physical model of totally three-dimensional par-allel reversible computing. But we know that it must bepossible, because we can straightforwardly design simpleclassical-mechanical models that already do the job (essen-tially, reversible clockwork) and that do not suffer fromany instability or timing-system inefficiency problems. Sincethese models are manifestly physically realizable (obviously

    buildable, once you have conceived them), and since allreal mechanical systems are, at root, quantum mechani-cal, a detailed quantum-mechanical fleshing out of theseclassical-mechanical models, if mapped to the nanoscale,

    would fit the bill. But significant work is still needed to actu-ally do this translation.

    Finally, in general, we should not assume that the best

    reversible designs in the long run necessarily must be basedon the adiabatic bistable-potential-well-type model, whichmay be unnecessarily restrictive. A more general sort ofmodel of reversible computing consists simply of a dynamicquantum state of the machines moving parts (whichmay be just spinning electrons) that evolves autonomouslyaccording to its own built-in physical interactions betweenits various subparts. As long as the evolution is highly coher-ent (almost unitary), and we can accurately keep track ofhow the quantum state evolves over time, the dissipation

    will be kept small. The devil is in figuring out the detailsof how to build a quantum system whose built-in, internalphysical interactions and natural evolution will automaticallycarry out the desired functions of logic, communication, and

    synchronization, without needing a full irreversible reset ofthe state upon each operation. But as I stated earlier, wecanbe confident that this is possible, because we can devisesimple mechanical models that already do the job. Our goalis just to translate these models to smaller sized and higherfrequency physical implementations.

    4.1.3. Algorithmic Issues

    For now, we take it as given that wewilleventually be ableto build bit devices thatcan operate with a high degree ofthermodynamic reversibility (i.e., very small entropy gener-ation per operation), and that the details of ballistic signalpropagation and timing synchronization can also be workedout. What, then, can we do with these devices?

    One key constraint is that physically reversible devicesare necessarily also logically reversible; that is, they per-form invertible (one-to-one, bijective) transformations ofthe (local) logical machine state.

    Debunking Some Misunderstandings First, the pre-ceding statement (physical reversibility requires logicalreversibility) has occasionally been questioned by variousauthors, due to some misunderstanding either of physics, orof the meaning of the statement, but it is not really open toquestion!It is absolutely as certain as our most certain knowl-

    edge about physical law. The reversibility of physics (whichfollows from the unitarity of quantum mechanics, or fromthe mathematical form of all Hamiltonian dynamical systems

    in general) guarantees that physics is reverse-deterministic,that is, that a given (pure, quantum) state of a closed systemcan only have been arrived at along a specific, unique tra-

    jectory. Given this fact, any time we have a mechanism thatunconditionally erases a bit (i.e., maps both of two distinctinitial logical states onto a single final logical state), with-out regard to any other knowledge that may be correlatedto that bits state, there must be a compensatory splittingof someotheradjoining system from one state into two dis-tinct states, to carry off the erased information, so that

    nomerging of possible state trajectories happens in the sys-tem overall. If the state distinction happens to become lost

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    14/52

    262 Nanocomputers: Theoretical Models

    somewhere, then it is, by definition, entropy. If the erasedbit was itself not already entropy before the erasure, thenthis entropy is furthermore newly generated entropy, and themechanism is then by definition not physically reversible. Incontrast, if the state information remains explicitly presentin the logical state that is transformed in the operation, thenthe mechanism is by definition logically reversible.

    If you think that you have found a way to uncondition-ally erase known logical bits without generating new entropy(without having also disproved quantum mechanics in theprocess), then check your work again, a lot more care-fully! Such a device would be exactly equivalent to the long-debunked perpetual motion machine [50]. I will personallystake my reputation on your having made a mistake some-

    where. For example, did you analyze the erasure of a logi-cal bit that was uncorrelated with any other bits and thus was

    alreadyentropy? No new entropy need be generated in thatcase. Or did you just reversibly decrement a long counteruntil it was zero, but you forgot that an exactly equal amountof timing information aboutwhenyou finished counting, rel-ative to other subsystems, is still present and still must be

    got rid of in order to interact with outside systems? Or didyou rely on an exponential number of high-energy states alldecaying to a single slightly lower energy state, while for-getting that the exponentially largernumberof higher energystates and slow decay rate will mean that there is an exactlyequal chance to go the other way and make a transitionto occupy the high-energy state instead, excited by thermalnoise? These issues are rather subtle, and it is easy to makemistakes when proposing a complicated new mechanism. Incontrast, the impossibility proof from basic, uncontroversialaxioms of physics is straightforward and easily verified to becorrect. If you want to erase logical bits without generatingentropy, you will first have to go back and show why mostof the basic physics that has been learned in the last 150

    years (such as statistical mechanics and quantum mechan-ics) must be totally wrong, despite agreeing perfectly withall our myriad experiments!

    Algorithmic Overheads Now, those misconceptionsaside, let us take a clear look at the algorithmic issues thatresult from the need for logical reversibility. (These issuesapply equally whether the logic of the particular computa-tion is implemented in hardware or in software.) Appar-ently, the requirement of logical reversibility imposes anontrivial constraint on the types of algorithms we can carryout. Fortunately, it turns out that any desired computationcanbe emulated by a reversible one [38], although appar-ently with some algorithmic overheads, that is, with some

    increases in the number of bits or ops required to carry outa given computation.First, some history: Rolf Landauer of IBM realized, as

    early as 1961 [19], that information-destroying operationssuch as bit erasure, destructive (in-place) AND, and so forthcan be replaced by analogous reversible operations that justmove the erased information somewhere else rather thanreally erasing it. This idea is now known as a Landauer

    embeddingof an irreversible computation into a reversibleone. But, at the time, Landauer thought this approach wouldpointlessly lead to an accumulation of unwanted informa-tion in storage that would still have to be irreversibly erased

    eventually. In 1963, Yves Lecerf [51], working independentlyon a related group theory problem, reinvented the Lan-dauer embedding and furthermore noted that the interme-diate results could also be reversiblyuncomputedby undoingthe original computation. We call this idea Lecerf reversal.In 1973, Charles Bennett independently rediscovered Lecerfreversal [38], along with the observation that desired results

    could be reversibly copied (known as aBennett copy) beforeundoing the computation. Simple as it was, this was the finalkey insight showing that computation did not necessarilyrequire entropy generation, as it revealed that a given area of

    working storage could indeed be reversibly reused for mul-tiple computations that produce useful results. All of theseideas were independently rediscovered in a Boolean logic cir-cuit context by Ed Fredkin and Tom Toffoli during their workoninformation mechanicsat MIT in the late 1970s [34].

    Unfortunately, although the extra storage taken up by theLandauer embedding can be reused, it does still at leasttemporarilytake up space in the computation and thus con-tributes to economic spacetime costs. Even this temporaryusage was shown not to be technically necessary by Lange

    et al. [52], who showed how to compute reversibly in generalusing virtuallynoextra space, essentially by iterating throughall possible machine histories. (The technique is very simi-lar to earlier complexity theory proofs showing that deter-ministic machines can simulate nondeterministic machinesin-place, using the same space [53].) But unfortunately, theLange et al. technique is not at all close to being practical,as it generally requires taking an exponentially longer timeto do the computation.

    Today, we know of a continuous spectrum of possibletrade-offs between algorithmic space and time overheads fordoing reversible computation. In 1989, Bennett describeda space of trade-offs between his original technique and asomewhat more space-efficient one [54], and later work by

    Williams [55] and Burhman et al. [56] showed some ways toextrapolate between the Bennett-89 approach and the Langeet al. one. However, these latter approaches are apparentlynot significantly more efficient in terms of overallspacetimecost than the older Bennett-89 one. It is, however, possibleto reduce the spacetime cost of the Bennett-89 algorithm bysimply increasing its degree of parallelization (activity fac-tor, hardware efficiency) so that less time is spent waitingfor various subcomputations. But this apparently gives onlya small improvement also.

    At present, we do not know how to reversibly emulatearbitrary computations on reversible machines without, atleast, a small polynomial amount of spacetime overhead. Ithas been conjectured that we cannot do better than this in

    general, but attempts (such as [57]) at a rigorous proof ofthis conjecture have so far been inconclusive. So, as far aswe know right now, it is conceivable that better algorithmsmay yet be discovered. Certainly, we know that much moreefficient algorithms do exist for many specific computations,sometimes with no overhead at all for reversible operation[4]. But Bennetts algorithm and its variations are the best

    we can do currently for arbitrarycomputations.Given the apparent algorithmic inefficiencies of reversible

    computing at some problems, as well as the limited qualityfactor of reversible devices even for ideal problems, it doesnot presently make sense to do all computations in a totally

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    15/52

    Nanocomputers: Theoretical Models 263

    reversible fashion. Instead, the computation should be bro-ken into pieces that are internally done reversibly, but withoccasional irreversible logical operations in between them.The size of the reversible pieces should be chosen so asto maximize the overall cost efficiency, taking into accountboth the algorithmic overheads and the energy savings, whileoptimizing the parameters of the emulation algorithms used.

    Examples of how to do this via analytical and numericalmethods are illustrated in [5, 21, 58]. That work shows thatreversible computing remains advantageous for overall costefficiency, despite its algorithmic overheads, even for worst-case types of computations, for which we do not know ofany better reversible algorithm than Bennetts.

    In cases where the reversible devices have a high quan-tum quality q, it turns out to make sense to do a substan-tial amount of reversible computing in between irreversibleoperations. Since the most efficient use of these reversiblecomputing resources may, in some cases, call for a hand-optimized reversible algorithm, the architecture and pro-gramming model should best include directives that directlyexpose the underlying reversibility of the hardware to

    the programmer, so that the programmer (as algorithmdesigner) can perform such optimizations [4]. In otherwords, the algorithm designer should be able to write codethat the architectureguaranteeswill be run reversibly, with

    nounnecessary overheads. Some examples of architecturesand programming languages that allow for this can be foundin [4, 59, 60].

    Can the machines potential reversibility be totally hiddenfrom the programmer while still preserving asymptotic effi-ciency? Apparently not. This is because the best reversiblealgorithm for a given problem may in general have a verydifferent form than the best irreversible algorithm [4]. So

    we cannot count on the compiler to automatically map irre-versible code to the best reversible code, at least, given any

    compiler technology short of a very smart general-purposeartificial intelligence which we could rely on to invent opti-mal reversible algorithms for us.

    How hard is it to program in reversible languages? At firstit seems a little bit strange, but I can attest from my ownexperience that one can very quickly become used to it. Itturns out that most traditional programming concepts can bemapped straightforwardly to corresponding reversible lan-guage constructs, with little modification. I have little doubtthat shortly after reversibility begins to confer significantbenefits to the cost efficiency of general-purpose computa-tion, large numbers of programmers, compiler writers, etc.

    will rush to acquire the new skills that will be needed to har-ness this new domain, of computing at the edge of physics.

    However, even reversible computing does not yet harnessall of the potential computational power offered by physics.For that, we must turn our sights a bit further beyond thereversible paradigm, towardquantum computing.

    4.2. Quantum Computing

    4.2.1. Generalizing Classical ComputingConcepts to Match Quantum Reality

    The core fact of quantum mechanics is that not all of theconceivable states of a physical system are actually opera-tionally distinguishable from each other. However, in the

    past, computer scientists artificially constrained our notionof what a computation fundamentally is, by insisting thata computation be a trajectory made up of primitive oper-ations, each of which takes any given legal state of themachine and changes it (if at all) to a state that isrequiredtobe operationally distinguishable from the original state (atleast, with very high probability). For example, a traditional

    Boolean NOT gate, or logical inverter, is designed to eitherleave its output node unchanged (if its input remains thesame), or to change its output to a new state that is clearlydistinct from what it was previously.

    But really this distinguishability restriction on operationswas an extra restriction that was, in a sense, arbitrarilyimposed by early computer designers, because this type ofoperation isnotthe only type that exists in quantum mechan-ics. For example, when a subatomic particles spin is rotated180 around itsxaxis, if the spin vector was originally point-ing up (+z), it will afterward be pointing down (z), andthis state is completely distinct from the up state. But ifthe spin was originally pointing at an angle halfway betweenthe x and z axes, then this operation will rotate it to an

    angle that is only 90 away from its original angle (namely,halfway between the x and z axes). This spin state is notreliably distinguishable from the original. Although it is atright angles to the original state in 3D space, itsstate vectoris not orthogonal to the original state in the Hilbert spaceof possible quantum state vectors. Orthogonality in Hilbertspace is the technical quantum-mechanical requirement fordistinguishability of states. Thus, the operation rotate a spinby 180 around a given axis does not change all states byorthogonalizing amounts, onlysomeof them.

    What if we allow a computation to be composed of oper-ations that do not necessarily change all legal states by anorthogonalizing amount? Then the new state after the oper-ation, although possibly different from the original state,

    will not necessarily be reliably observationally distinguish-able from it. However, if we do not try to observe the statebut instead just let it continue to be processed by subsequentoperations in the machine, then this lack of distinguishabil-ity is not necessarily a concern. It can simply be the situationthat is intended. In essence, by loosening our requirementthat every state of the computation be distinguishable fromthe previous one, we open up the possibility of performingnew types of operations, and thus traversing new kinds oftrajectories through state space, ones that were not previ-ously considered to be legal computations.

    Does this new possibility confer additional computationalpower on the model? We cannot prove that it does, but

    it is currently strongly believed to. Why? Because specific,well-defined algorithms have been found that use these newtypes of trajectories to perform certain kinds of computa-tional tasks using exponentially fewer operations than withthe best classical algorithms that we know of [3]. In other

    words, opening up this larger space of operations revealsdrastic shortcuts through state space; that is, transforma-tion trajectories that get us to where we want to go usingexponentially fewer steps than the shortest classical pathsthat we know of.

    The two most important examples of these apparentexponential shortcuts that have been found so far are the

  • 7/25/2019 Frank M.P. - Nanocomputers. Theoretical Models(2004)(52).pdf

    16/52

    264 Nanocomputers: Theoretical Models

    following: (1) Shors quantum factoring algorithm [61] and(2) simulations of quantum systems [62].

    Shors factoring algorithm can factor n-digit numbersusing a number of quantum operations that increases onlyquadrat