statistical mechanics - cornell...

Statistical Mechanics

Contents

Chapter 1. Ergodicity and the Microcanonical Ensemble 1

1. From Hamiltonian Mechanics to Statistical Mechanics 1

2. Two Theorems From Dynamical Systems Theory 6

3. Ergodicity and the Microcanonical Ensemble 9

4. Quantum Mechanics and Density Matrices 12

5. The Validity of the Ergodic Hypothesis 18

6. Statistical Fluctuations 20

7. The Statistical Basis for Entropy 27

8. Thermodynamics Equilibrium 29

9. Does Entropy Increase? 31

10. Information Theory 32

References 32

3

CHAPTER 1

Ergodicity and the Microcanonical Ensemble

The pressure in an ideal gas, recall, is proportional to the average kinetic energy

per molecule. Since pressure may be understood as an average over billions upon

billions of microscopic collisions, this simple relationship illustrates how statistical

techniques may be used to supress information about what each individual molecule

is doing in order to extract information about what the molecules do on average as

a whole. Our first task, as we examine the foundations of statistical mechanics, is to

understand more precisely why this suppression is necessary and how exactly it is

to be accomplished with more precision. We must, therefore, begin by considering

the laws of microscopic dynamics. In physics, there are two choices here — the laws

of classical mechanics and the laws of quantum mechanics. Remarkably, the choice

is not important; in either case, detailed solutions to the dynamical equations are

completely unnecessary. We will consider both cases, but follow the classical route

through Hamiltonian mechanics first, as this provides the clearest introduction to

the structure of statistical mechanics. In this section, we will review the essential

elements of Hamiltonian mechanics and discuss the need for and basic elements of

a probabilitistic framework...

1. From Hamiltonian Mechanics to Statistical Mechanics

Newton’s second law for a particle of mass m,

Ftotal = mq,

is a second-order ordinary differential equation. Therefore, given the instantaneous

values of the particle’s position q and momentum p = mq at some time t = 0, the

particle’s subsequent motion is uniquely determined for all t > 0. For this reason,

the state of a classical system consisting of n configurational degrees of freedom can

be thought of as a point (q1, . . . , qn, p1, . . . , pn) in a 2n-dimensional space called the

1

2 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE

phase space of the system. As the state evolves in time, this point will trace out

in phase space a trajectory defined by the tangent vector,

(1) v(t) = (q1(t), . . . , qn(t), p1(t), . . . , pn(t)) .

A Hamiltonian system evolves according the canonical equations of motion,

qi =∂

∂piH(q,p, t),(2)

pi = − ∂

∂qiH(q,p, t),(3)

where the function

H(q,p, t) = H(q1, . . . , qn, p1, . . . , pn, t)

is called the Hamiltonian of the system. These equations represent the full content

of Newtonian mechanics. Note that exactly one trajectory passes through each

point in the phase space; the classical picture is completely deterministic.

Example (single particle dynamics). Find the canonical equations of motion

for a single particle of mass m in an external potential V (q).

Solution. The Hamiltonian for this system is simply

H(q, p) =p2

2m+ V (q),

which we recognize as the sum of the kinetic and potential energies. This leads to

the following dynamical equations:

q =p

m

p = − ∂

∂qV (q).

A system of many interacting particles has a similar solution, though the potential

term becomes much more complicated.

We see, therefore, that the first canonical equation (2) generalizes the relationship

between velocity and momenta (in a more complicated system, the i-th momentum

may depend on several of the qi and qi). Similarly, the second canonical equation

(3) generalizes the rule that force may be expressed as a gradient of an energy

function.

1. FROM HAMILTONIAN MECHANICS TO STATISTICAL MECHANICS 3

In a Hamiltonian system, the time dependence of any function of the momenta

and coordinates

f = f(q1, . . . , qn, p1, . . . , pn, t)

can be written,

(4)dfdt

=f,H

+∂f

∂t,

wheref,H

is the Poisson bracket of the function f and the Hamiltonian.

The Poisson bracket of two functions f1 and f2 with respect to a set of canonical

variables is defined as

(5)f1, f2

=

n∑j=1

(∂f1∂qj

∂f2∂pj

− ∂f1∂pj

∂f2∂qj

).

The Poisson bracket is important in Hamiltonian dynamics because it is indepen-

dent of how the various coordinates and momenta are defined; that is,u, v

takes

the same value for any set of canonical variables q and p. Furthermore, the canon-

ical equations of motion can be re-written in the following form,

qi =qi,H

,(6)

pi =pi,H

.(7)

This is known as the Poisson bracket formulation of classical mechanics. It is impor-

tant to recognize that very similar expressions arise in quantum mechanics (we’ll

look at these in Section 4). Indeed, every classical expression involving Poisson

brackets has a quantum analogue employing commutators. This elegant correspon-

dence principle, first pointed out by Dirac, has deep significance for the relationship

between classical and quantum physics. It also provides our first glimpse of why

statistical mechanics transcends the details of the microscopic equations of motion.

For now, we return to the classical route into the heart of statistical mechanics...

Examining a physical system from the classical mechanical point of view, one

first constructs the canonical equations of motion and then integrates these from

known initial conditions to determine the phase trajectory. If the system of in-

terest involves a macroscopic number of particles, this approach condemns one to

numerical computations involving matrices of bewildering size. Yet system size is

not the major obstacle: The canonical equations of motion are in general nonlin-

ear and, as a result, small changes in system parameters or initial conditions may


lead to large changes in system behavior. In particular, neighboring trajectories in

many nonlinear systems diverge from one another at an exponential rate, a phe-

nomenon known as sensitive dependence on initial conditions or, more popularly,

as the butterfly effect, the idea being that a flap of a butterfly’s wings may make

the difference between sunny skies and snow two weeks later. Systems exhibiting

sensitive dependence on initial conditions are said to be chaotic. Calculations of

chaotic trajectories are intolerant of even infinitesimal errors, such as those aris-

ing from finite precision and uncertainties in the state of the system. Therefore,

setting aside the impractical integration problem of calculating a high-dimensional

phase trajectory, our necessarily incomplete knowledge of initial conditions in a

macroscopic system seriously compromises our ability to predict future evolution.

Though the prospects for dealing directly with the phase trajectories of a macro-

scopic system of particles seem hopeless, it is not the case that we must discard

all knowledge of the microscopic physics of the system. There are many macro-

scopic phenomena which cannot be understood from a purely macroscopic point

of view. What is combustion? What determines whether a solid will be a metal

or an insulator? What are the energy sources in stellar and galactic cores? These

questions are best dealt with by appealing to various microscopic details. On the

other hand, given the success of the laws of thermodynamics, it is evident that

macroscopic systems exhibit a collective regularity where the exact details of each

particle’s motion and state are nonessential. This suggests that we may envision

the time evolution of macroscopic quantities in a Hamiltonian system as some sort

of average over all of the microscopic states consistent with available macroscopic

knowledge and constraints. For this reason, one abandons the mechanical approach

of computing the exact time evolution from a single point in phase space in favor

of a statistical approach employing averages over an entire ensemble of points in

phase space. This is accomplished as follows:

Consider a large collection of identical copies of the system, distributed in phase

space according to a known distribution function,

ρ(q,p, t) = ρ(q1, . . . , q3N , p1, . . . , p3N , t),

1. FROM HAMILTONIAN MECHANICS TO STATISTICAL MECHANICS 5

where

(8)∫ρ(q,p, t) dq dp = 1 for all t.

ρ(q,p, t) is the density in phase space of the points representing the ensemble,

normalized according to (8), and may be interpreted as describing the probability

of finding the system in various different microscopic states. Once ρ(q,p, t) is

specified, we can compute the probabilities of different values of any quantity f

which is a function of the canonical variables. We can also compute the mean value

〈f〉 of any such function f by averaging over the probabilities of different values,

(9) 〈f(t)〉 =∫f(q,p) ρ(q,p, t) dp dq.

Thus, instead of following the time evolution of a single system through many

different microscopic states, we consider at a single time an ensemble of copies of

the system distributed into these states according to probability of occupancy. This

shift is one of the cornerstones of statistical mechanics.

Exercise 1.1. Derive equation (4). HINT: Use the chain rule

dfdt

=∑

i

∂f

∂qi

∂qi∂t

+∑

i

∂f

∂pi

∂pi

∂t+∂f

∂t

Exercise 1.2. Show that H(q,p, t) is a constant of the motion if and only if

it does not depend explicitly on time.

Exercise 1.3. Show that the canonical equations of motion can be re-written

in the following form,

qi =qi,H

,(10)

pi =pi,H

.(11)

This is known as the Poisson bracket formulation of classical mechanics.

Exercise 1.4. Compute the following Poisson brackets:

(1)qi, qj

(2)

qi, pj

Are your results in any way familiar, given your knowledge of quantum mechan-

ics? If so, how do the interpretations of these results differ from their quantum

mechanical analogues?


Exercise 1.5. Show that the canonical equations of motion can be written in

the symplectic form,

x = M∂

∂xH(q,p, t),

where x = [q,p] (what’s M in this expression?)

2. Two Theorems From Dynamical Systems Theory

One is often interested in general qualitative questions about a system’s dy-

namics, such as the existence of stable equilibria or oscillations. In discussing such

questions, mathematicians often speak of the flow of a dynamical system: Any

autonomous system of ordinary differential equations can be written in the form

(12) x = f(x)

(changes of variables may be required if the equations involve second-order and

higher derivatives). If we interpret a general system of differential equations (12)

as representing a fluid in which the fluid velocity at each point x is given by the

vector f(x), then we may envision any particular point x0 as flowing along the

trajectory φ(x0) defined by the velocity field. More precisely, we define

φt(x0) = φ(x0, t),

where φ(x0, t) is a point on the trajectory φ(x0) passing through the initial condition

x0; φt maps the starting point x0 to its location after moving with the flow for a

time t. It is important to note that φt defines a map on the entire phase space —

we may envision the entire phase space flowing according the velocity field defined

by (12). Indeed, we shall see in this section that this fluid metaphor is especially

appropriate in statistical mechanics.

The notion of the flow of a dynamical system very naturally accomodates a shift

towards considering how whoIe regions of phase space participate in the dynamics,

a shift away from the language of initial conditions and trajectories. This shift is

what enables mathematicians to state and prove general theorems about dynamical

systems. It also turns out that this shift provides the natural setting for several of

the central concepts of statistical mechanics. In the previous section, we motivated

a statistical framework in which, rather than follow the time evolution of a single

system, we consider at a single time an ensemble of copies of that system distributed

2. TWO THEOREMS FROM DYNAMICAL SYSTEMS THEORY 7

in phase space according to probability of occupancy. The main player in this new

framework is the distribution function ρ(q,p, t) describing the ensemble. ρ allows

us to take into account take into account which states in phase space a system is

likely to occupy1. In this section, we examine how the ensemble interacts with the

flow defined by a set of canonical equations. It turns out that, in a Hamiltonian

system, the time evolution of ρ has several interesting properties, which are the

subject of two important theorems from dynamical systems theory.

We begin with a simple calculation of the rate of change of ρ. We know from

(4), which describes the time evolution of any function of the canonical variables q

and p, that

(13)dρdt

=∂ρ

∂t+ρ,H

.

However, we also know from local conservation of probability that ρ must satisfy a

continuity equation,

(14)∂ρ

∂t+ ∇ · (ρv) = 0,

where

∇ =(

∂

∂q1, . . . ,

∂

∂qn,∂

∂p1, . . . ,

∂

∂pn

)is the gradient operator in phase space and v is defined in (1). Applying the chain

rule, we see that

(15) ∇ · (ρv) =ρ,H

+ ρ(∇ · v).

Since the ∇·v = 0 vanishes for a Hamiltonian system, (13) and (14) are equal and

therefore

(16)∂ρ

∂t+ρ,H

= 0.

and

(17)dρdt

= 0.

This result is known as Liouville’s theorem. The partial derivative term in (16)

expresses the change in ρ due to elapsed time dt, while the (∇ρ) · v =ρ,H

term expresses the change in ρ due to motion along the vector field a distance vdt.

1Mathematicians include this as part of a more general approach, called measurable dynamics,

which we need not go into here.


Thus, Liouville’s theorem tells us that the local probability density — as seen by

an observer moving with the flow in phase space — is constant in time; that is, ρ is

constant along phase trajectories. The theorem can also be interpreted as stating

that, in a Hamiltonian system, phase space volumes are conserved by the flow or,

equivalently, that ρ moves in phase space like an incompressible fluid.

From the incompressible fluid analogy, we see that while Hamiltonian systems

can exhibit chaotic dynamics, they cannot have any attractors! Liouville’s theorem

has other important consequences when combined with system constraints, such

as conservation laws. Conservation laws constrain the flow to lie on families of

hypersurfaces in phase space. These surfaces are bounded and invariant under the

flow:

φt(S) = S

for each hypersurface S defined by a conservation law. When volume-preserving

flows are restricted to bounded, invariant regions of phase space, a surprising result

emerges: Let X be a bounded region of phase space which is invariant under a

volume-preserving flow. Take any region S which occupies a finite fraction of the

total volume in X (this specifically excludes what mathematicians call sets of mea-

sure zero: sets with no volume). Then any randomly selected initial condition x in

S generates a trajectory φt(x) which returns to S infinitely often — this is known

as the Poincare recurrence theorem.

In order to understand where this theorem comes from and what it means, we

consider how the region S moves under the flow. Define a function f which maps

S along the flow for a time T ,

f(S) = φT (S).

Subsequent iterations of this time-T map produce a sequence of subsets of X,

f2(S) = φ2T (S), f3(S) = φ3T (S), and so on, all with finite volume in X. Each

iteration takes a bite out ofX and so, if we iterate enough times, eventually we must

exhaust all of the volume in X. As result, two of these subsets must intersect; i.e.

there must exist integers i and j, with i > j, such that f i(S)∩ f j(S) is non-empty.

This implies that f i−j(S)∩S is also non-empty. S must fold back on itself repeatedly

under this time-T flow map. By considering small subsets of S, which must also have

this property, we can convince ourselves that a randomly selected point in S does

3. ERGODICITY AND THE MICROCANONICAL ENSEMBLE 9

indeed return to S infinitely often (for a precise proof of the theorem, see references

at end of chapter). The Poincare recurrence theorem as stated implies that almost

every initial condition x0 in the bounded region X generates a trajectory which

returns arbitrarily close to x0 infinitely many times. This recurrence property is

truly remarkable when you consider the bewildering array of nonlinear Hamiltonian

systems to which it may be applied. Indeed, the Poincare recurrence theorem is

considered the first great theorem of modern dynamics; we will have more to say

about its role in statistical mechanics later on.

3. Ergodicity and the Microcanonical Ensemble

Liouville’s Theorem has profound consequences for a system in equilibrium.

An ensemble is said to be stationary if the probability density does not depend

explicitly on time,

(18)∂

∂tρ(q, p, t) = 0.

This restriction guarantees that all ensemble averages will be time-independent;

we therefore expect that systems in equilibrium can be represented by stationary

ensembles. Note that an stationary ensemble satisfying Liouville’s Theorem (17)

has a vanishing Poisson Bracket with the Hamiltonian,

(19)ρ(q, p),H(q, p)

= 0.

Sinceqi, pj

= δij (where δij is the Kronecker delta function), no function of q

or p alone will satisfy (19). The general solution for a stationary ensemble has the

form

ρ(q, p) = ρ(H(q, p)).

The simplest example of a stationary ensemble is the microcanonical ensem-

ble, for which the distribution ρ(q, p) is at all times uniformly distributed over all

accessible microstates defined by constant energy. The assumption that the micro-

canonical ensemble is valid is one of the cornerstones of statistical mechanics and

is known as the postulate of equal a priori probabilities.

—————————————————————————————————-


Birkhoff pointwise ergodic theorem: For almost all x,

limn→∞

1T

∫ ∞

0

f(φt(x)) dt = f∗(x)

The important statement here is that functions of dynamical variables, when aver-

aged along trajectories, converge almost everywhere to something. The limit may

depend on x, which is why f∗ is written above as a function of x and is why math-

ematicians call this a “pointwise” theorem, but the limit almost always exists. (We

have to say almost always b/c...).

f∗(φt(x) = f∗(x)

(f∗ is invariant under the flow) and∫f∗ρdx =

∫fρdx

(ensemble average of f equals ensemble average of limit of time average of f, somehow

the time averaging doesn’t affect the ensemble average)

Definition: Ergodic means φt(S) = S if and only if m(S) = 0 or 1 (The only

invariant sets are those with volume equal to the that of the entire space and

those with zero volume) Loosely put, this means that almost every trajectory wan-

ders almost everywhere (on its energy surface). REF BACK TO RECURRENCE

AND NOTE HOW MUCH STRONGER THIS IS AND THAT MANY PHYSICS

TEXTBOOKS SEEM TO BE CONFUSED ABOUT THIS DISTINCTION

One more theorem, stated w/o proof: Ergodic if and only if for any f , f(φt(x) =

f(x) mean f is constant (i.e. the only invariant functions are constants)

This last theorem leads us to the microcanonical ensemble. Since ρ is invariant

under the flow, by Liouville’s thm, ergodicity means ρ is a constant (on energy

surface) — I’M GETTING TIRED OF HAVING TO MENTION THE ENERGY

SURFACE ALL THE TIME. CAN I PREEMPT THIS EARLIER? — So the as-

sumption of a priori equal probabilites is really just the assumption that the system

is ergodic; this foray into dynamics helps us clarify what we’re assuming!

Now back to a general f for an ergodic system. We know f∗ must be a constant

(almost everywhere). We can actually compute this constant by integrating over

the ensemble: ∫f∗ρ = f∗

∫ρ = f∗

3. ERGODICITY AND THE MICROCANONICAL ENSEMBLE 11

But, by (), this is just the ensemble average of f, GIVES US TIME AVR EQUALS

ENSEMBLE AVR...THIS IS WHAT WE ARE REALLY WANTING TO DO IN

STAT MECH.

Use this to justify constant ρ as follows...INDICATOR FUNCTION (LEADS

TO TRAJECTORIES SPEND ON AVR THE SAME AMOUNT OF TIME EV-

ERWHERE) – POSSIBLE EX?

—————————————————————————————————-

For a stationary ensemble, the ensemble average f (9) is time-independent.

Thus the time average of f equals f ,

(20) f =1T

∫ T

0

f dt.

Switching the order of integration, it follows that f equals the ensemble average of

the time-average of f(q, p),

(21) f =∫ (

1T

∫ T

0

f(p, q) dt

)ρ(p, q) dp dq.

If the averging time T is short, the right-hand side of this equation clearly depends

on the particular microscopic states occupied. However, for a microcanonical en-

semble and a sufficiently long time T , the average will turn out to depend only

on the macroscopic constraints. The reason for this is that a probability density

spread uniformly throughout the accessible region of phase space implies that a

phase trajectory confined to this region wander uniformly as well; that is, given

sufficient time, the trajectory enters any neighborhood of every point in the region

— mathematicians call this the ergodic theorem. Therefore, for time T sufficiently

large, the time-average of f(p, q) is the same for every member of the ensemble.

From this follows the major result, glimpsed earlier, that long-time averages equal

ensemble averages,

(22)1T

∫ T

0

f(p, q) dt =∫f(p, q) ρ(p, q) dp dq,

independent of initial and final states.

We can clarify the interplay of these statistical ideas in the following summary,

due to Landau and Lifshitz: The ergodic theorem states that after a sufficiently long

time the system’s phase trajectory will return to any neighborhood of an allowed

point in phase space. Let ∆T represent the small part of the total time T that the


trajectory spends in a small phase volume ∆q∆p. If we combine ergodicity with the

microcanonical ensemble, then this trajectory spends on average the same amount

of time everywhere —

(23) limT→∞

∆TT

= w,

where w is some fixed proportion. It follows from (23) and the definition of the

density ρ that

(24) dw = ρ(q, p) dq dp.

Combining (9) and (24),

(25) ensemble average =∫f(p, q) ρ(p, q, t) dp dq

=∫f(p, q) dw = lim

T→∞

1T

∫f(p, q) dt = time average,

we see that, indeed, statistical averaging over the ensemble at fixed time is equivalent

to time-averaging a single member of the ensemble. This is what allowed the loose

use of averaging in our discussion of pressure in an ideal gas to work; this is what

allowed us to ignore the time evolution and only consider what a typical gas molecule

was doing on average. Furthermore, to the extent that all measurements in the

lab are time averages, ergodicity and the microcanonical ensemble firmly ground

macroscopic measurements in the microscopic statistical dynamics of the system

being investigated.

4. Quantum Mechanics and Density Matrices

In classical physics, the state of a system at some fixed time t is uniquely

defined by specifying the values of all of the generalized coordinates qi(t) and mo-

menta pi(t). In quantum mechanics, however, the Heisenberg uncertainty principle

prohibits simultaneous measurements of position and momentum to arbitrary pre-

cision. We might therefore anticipate some revisions in our approach. It turns

out, however, that the classical ensemble theory developed above carries over into

quantum mechanics with hardly revision at all. Most of the necessary alterations

are built directly into the edifice of quantum mechanics and all we need is to find

suitable quantum mechanical replacements for the density function ρ(q, p) and Li-

ouville’s Theorem. Understanding this is the goal of this section. Readers who are

4. QUANTUM MECHANICS AND DENSITY MATRICES 13

unfamiliar with Dirac notation and the basic concepts of quantum mechanics are

referred to the references at the end of the chapter.

The uncertainty principle renders the concept of phase space meaningless in

quantum mechanics. The quantum state of a physical system is instead repre-

sented by a state vector, |ψ〉, belonging to an abstract vector space called the

state space of the system. The use of an abstract vector space stems from the

important role that superposition of states plays in quantum mechanics — lin-

ear combinations of states provide new states and, conversely, quantum states can

always be decomposed into linear combinations of other states. The connection

between these abstract vectors and experimental results is supplied by the formal-

ism of linear algebra, by operators and their eigenvalues. Dynamical variables,

such as position and energy, are represented by self-adjoint linear operators on the

state space and the result of any measurement made on the system is always rep-

resented by the eigenvalues of the appropriate operator (that is, the eigenvectors

of an observable physical quantity form a basis for the entire state space). This

use of operators and eigenvalues directly encodes many of the distinct hallmarks of

quantum mechanical systems: Discretization, such as that of angular momentum or

energy observed in observed in numerous experiments, simply points to an operator

with a discrete spectrum of eigenvalues. And wherever the order in which several

different measurements are made may affect the results obtained, the associated

quantum operators do not commute.

In quantum mechanics, the time evolution of the state vector is described by

Schrodinger’s equation,

(26) i~∂

∂t|ψ(t)〉 = H(t) |ψ(t)〉,

where H(t) is the Hamiltonian operator for the system; this evolution law replaces

the canonical equations of classical mechanics.

Exercise 1.6 (single particle dynamics). Write down, using wavefunctions

ψ(q, t), Schrodinger’s equation for a single particle of mass m in an external po-

tential V (q).

Solution. Recall, that the classical Hamiltonian for this system is simply

H(q, p) =p2

2m+ V (q).


We transform this into a quantum operator by replacing q and p with the appropriate

quantum operators: q is the position operator and

p =~i

∂

∂q

is the momentum operator for a wavefunction ψ(q, t). Then, Schrodinger’s equation

(26) becomes the following partial differential equation,

(27) i~∂

∂tψ(q, t) =

(− ~2

2m∇2 + V (q)

)ψ(q, t).

Schrodinger’s equation has a number of nice properties. First, as a linear

equation, it directly expresses the principle of superposition built into the vector

structure of the state space — linear combinations of solutions to (26) provide new

solutions. In addition, it can be shown that the norm of a state vector 〈ψ|ψ(t)〉

is invariant in time; this turns out to have a nice interpretation in terms of local

conservation of probability. On the other hand, Schrodinger equation is not easy

to solve directly. Even a system as simple as the one-dimensional harmonic os-

cillator requires great dexterity. For a macroscopic system, (26) generates either

an enormous eigenvalue problem or a high-dimensional partial differential equation

(consider the generalization of (27) to a many-body system). Either way, we see

that direct solution is hopeless. The situation is essentially identical with that of

macroscopic classical mechanics — the mathematics and, more importantly, our

lack of information about the microscopic state (quantum numbers, in this case)

necessitate a statistical approach.

We would like to find a quantum mechanical entity that replaces the classical

probability density ρ(q,p), which uses probabilities to represent our ignorance of

the true state of the system. Unfortunately, the usual interpretation of quantum

mechanics already employs probabilities on a deeper level: If the measurement of

some physical quantity A in this system is made a large number of times (i.e. on

a large ensemble of identically prepared systems), the average of all the results

obtained is given by the expectation value

(28) 〈A〉 = 〈ψ|A|ψ〉,

provided the quantum state |ψ(t)〉 is properly normalized to satisfy 〈ψ|ψ(t)〉 = 1.

In order to understand the consequences of this, we introduce a basis of eigenstates


for the operator A. Let |ai〉 be the eigenvector corresponding to the eigenvalue ai.

Since the |ai〉 form a basis, we can expand the identity operator as follows,

(29) 1 =∑

i

|ai〉〈ai|.

Inserting this operator into (28) twice, we obtain

(30) 〈A〉 =∑

i

ai

∣∣〈ai|ψ〉∣∣2.

Comparing this result to the definition of the expectation value,

(31) 〈A〉 =∑

i

ai p(ai),

we see that∣∣〈ai|ψ〉

∣∣2 must be interpreted as represented the probability p(ai) of ob-

taining ai as the result of the measurement. This probabilistic framework replaces

the classical notion of a dynamical variable having a definite value. While the ex-

pectation value of A is a definite quantity, particular measurements are indefinite

— in quantum mechanics we can only talk about the probabilities of different out-

comes of an experiment. Now we can introduce an ensemble. Instead of considering

a single state |ψ〉, let pk represent the probability of the system being in a quantum

state represented by the normalized state vector |ψk〉. If the system is actually in

state |ψk〉, then the probability of measuring ai is simply∣∣〈ai|ψk〉

∣∣2. If, however,

we are uncertain about the true state then we have to average over the ensemble.

In this case, the total probability of measuring ai is given by

(32) p(ai) =∑

k

pk

∣∣〈ai|ψk〉∣∣2 = 〈ai|

(∑k

|ψk〉pk〈ψk|)|ai〉.

The object in parentheses in this last expression,

(33) ρ =∑

k

|ψk〉pk〈ψk|,

is known as the density operator. (33) turns out to be exactly what we’re look-

ing for, the quantum mechanical operator corresponding to the classical density

function ρ(q, p). Recall, that the classical density satisfies the following properties:

(1) Non-negativity of probabilities: ρ(q, p) must be non-negative for all points

in the phase space.


(2) Normalization of probabilities:∫ρ(q, p) dq dp = 1.

(3) Expectation values: The average value of a dynamical variable A(p, q)

across the entire ensemble represented by ρ(q, p) is given by

〈A〉 =∫A(q, p)ρ(q, p) dq dp.

These properties carry over into the quantum mechanical setting, with appropriate

modification (see exercises). In particular, it can be shown that

〈A〉 = traceAρ.

Apart from traces over a density operator replacing integration over the classical

ensemble, the statistical description of a complex quantum system is essentially no

different than that of a complex classical system. The time evolution of the density

operator ρ will be given by a quantum version of Liouville’s Theorem and will lead

to the same notions of a microcanonical ensemble and ergodicity.

First, we derive the quantum evolution law for ρ. Using the chain rule, we can

write

(34) i~∂ρ

∂t=∑

k

i~[( ∂∂t|ρ〉)pk〈ρ|+ |ρ〉

)pk

( ∂∂t〈ρ|)].

Substituting the Schrodinger equation, this reduces to

(35) i~∂ρ

∂t=∑

k

[(H|ρ〉

)pk〈ρ|+ |ρ〉

)pk

(H〈ρ|

)]= Hρ− ρH.

Thus,

(36)∂ρ

∂t= − 1

i~[ρ,H],

where [ρ,H] = ρH −Hρ is called the commutator of ρ and H. Note the striking

resemblance between (36) and Liouville’s Theorem — the commutator of the density

and Hamiltion operators has replaced the classical Poisson bracket of the density

and Hamiltonian functions but the expressions are otherwise identical. This is a

special case of a correspondence first pointed out by Dirac:

classical Poisson bracket,u, v

−→ quantum commutator,1i~[u, v].


As in the classical setting, a stationary ρ should be independent of time; for an

equilibrium quantum system, ρ must therefore be a function of the Hamiltonian,

ρ(H). The simplest choice is again a uniform distribution,

(37) ρ =∑

k

|ψk〉1n〈ψk|,

where n is the number of states |ψk〉 in the ensemble. This the quantum micro-

canonical ensemble. It is essentially the same as the classical one, except discrete.

...THE SAME STATISTICAL PRINCIPLES APPLY, WE JUST HAVE TO

SWITCH TO A DISCRETIZED FORMALISM (TRACES OVER OPERATORS

INSTEAD OF...)

Exercise 1.7. Show that the eigenvalues of the density operator are non-

negative.

Solution. Let ρ′ represent any eigenvalue of ρ and let |ρ′〉 be the eigenvector

associated with this eigenvalue. Then∑k

|ψk〉pk〈ψk|ρ′〉 = ρ|ρ′〉 = ρ′|ρ′〉

Multiplying on the left by 〈ρ′|, we obtain∑k

pk

∣∣〈ψk|ρ′〉∣∣2 = ρ′〈ρ′|ρ′〉.

It follows that, since the pk are positive and 〈ρ′|ρ′〉 is non-negative, ρ′ cannot be

negative. Since eigenvalues in the quantum setting represent measurements in the

classical setting, this result mirrors property (1) above.

Exercise 1.8. Show that the matrix representation of ρ in any basis satisfies

(38) traceρ

= 1.

Solution. Consider a basis of eigenstates |ai〉 of the operator A. The matrix

elements ρij = 〈ai|ρ|aj〉 are the representation of ρ in this basis. Then,

traceρ

=∑

i

〈ai|ρ|ai〉 =∑

i

∑k

pk

∣∣〈ψk|ai〉∣∣2

=∑

k

pk

(∑i

∣∣〈ψk|ai〉∣∣2) =

∑k

pk = 1


Since the trace is invariant under a change of basis, this result holds for any basis.

The condition traceρ

= 1 should be compared to the normalization property (2)

above.

Exercise 1.9. Show that, in a quantum ensemble represented by the operator

ρ, the expectation value of an operator A satisfies

(39) 〈A〉 = traceAρ.

Solution.

〈A〉 =∑

k

pk〈ψk|A|ψk〉 =∑k,i

pk〈ψk|ai〉〈ai|A|ψk〉

=∑i,k

〈ai|A|ψk〉pk〈ψk|ai〉 =∑i,k

〈ai|Aρ|ai〉 = traceAρ.

This result should be compared to the classical definition of expectation value, prop-

erty (3) above.

5. The Validity of the Ergodic Hypothesis

One important feature of Hamiltonian dynamics is the equal status given to

coordinates and momenta as independent variables, as this allows for a great deal of

freedom in selecting which quantities to designate as coordinates and momenta (the

qi and pi are often called generalized coordinates and momenta). Any set of vari-

ables which satisfy the canonical equations (2-3) are called canonical variables.

One may transform between different sets of canonical variables; these changes of

variables are called canonical transformations. Note that while the form of the

Hamiltonian depends on how the chosen set of canonical variables are defined, the

form of the canonical equations are by definition invariant under canonical trans-

formations. . .

Hamiltonian systems have a great deal of additional structure. The quantity,

(40)∮

γ

p · dq =n∑

i=1

∮γ

pi dqi,

known as Poincare’s integral invariant, is independent of time if the evolution of

the closed path γ follows the flow in phase space. The left-hand side of (40) is also

known as the symplectic area. This result can be generalized if we extend our phase

space by adding a dimension for the time t. Let Γ1 be a closed curve in phase space

5. THE VALIDITY OF THE ERGODIC HYPOTHESIS 19

(at fixed time) and consider the tube of trajectories in the extended phase space

passing through points on Γ1. If Γ2 is another closed curve in phase space enclosing

the same tube of trajectories, then

(41)∮

Γ1

(p · dq − H dt) =∮

Γ2

(p · dq − H dt).

This result that the integral ∮(p · dq − H dt)

takes the same value any two paths around the same tube of trajectories is called

the Poincare-Cartan integral theorem. Note, if both paths are taken at fixed time,

then (41) simply reduces to (40).

Structure of this sort, as well as the presence of additional invariant quantities,

greatly constrains the flow in phase space and one may wonder whether this struc-

ture is compatible with the ergodic hypothesis and the microcanonical ensemble.

The most extreme illustration of the conflict is the special case of integrable Hamil-

tonian systems. A time-independent Hamiltonian system is said to be integrable

if it has n indepedent global constraints of the motion (one of which is the Hamil-

tonian itself), no two of which have a non-zero Poisson bracket. The existence

of n invariants confinements the phase trajectories to an n-dimensional subspace

(recall that the entire phase space is 2n-dimensional; this is a significant reduction

of dimension). The independence of these invariants guarantees that none can be

expressed as a function of the others. The last condition, that no two of the in-

variants has a non-zero Poisson bracket, restricts the topology of the manifold to

which the trajetories are confined — it must be a n-dimensional torus. A canonical

transformation to what are known as action-angle variables, for which

Ii =12π

∮γi

p · dq

provides the canonical momenta and the angle θi around the loop γi provides the

canonical coordinates, simplifies the description immensely: Each Ii provides a

frequency for uniform motion around the loops defined by the γi, generating tra-

jectories which spiral uniformly around the surface of the n-torus. For most choices

of the Ii, a single trajectory will fill up the entire torus; this is called emphquasi-

periodic motion. The microcanonical ensemble, for which the trajectories wander


ergodically on an (2n− 1)-dimensional energy surface, captures none of this struc-

ture. On one hand, highly structured Hamiltonian systems appear to exist in

Nature, the premiere example being our solar system. On the other hand, we have

the remarkable success of the statistical mechanics (and its underlying hypotheses

of ergodicity and equal a priori probabilities) in providing a foundation for thermo-

dynamics and condensed matter physics. This success remains a mystery.

6. Statistical Fluctuations

In this section, we consider some of the statistical consequences of having a

large number N of particles in a physical system. In particular, we examine two

important theorems from probability theory, the law of large numbers and the

central limit theroem. We also derive the important result that, relative to the

size of the system, the size of charactistic fluctuations falls off like 1/√N as N

increases. This result has consequences for the thermodynamic uniformity observed

in macroscopic systems and will lead us, in the next section, to the concept of

entropy. The central limit theorem is a very deep result reponsible for the prevalence

of normal distributions in nature.

There are two prototypical systems used in physics to introduce the topic of

fluctuations — coin flipping and the one-dimensional random walk — and these

two systems are, in fact, mathematically identical. In a random walk, we imagine a

particle which takes discrete steps in randomly chosen directions, where every step

is totally uncorrelated with all previous steps. Though some texts use the colorful

analogy of a drunk man stumbling around, the physical motivation for interest

in the random walk grew out of the problem of Brownian motion, which we will

examine in a later chapter. For a one-dimensional random walk, there are only two

possibilities, a step to the “right” and a step to the “left”, and it is usually assumed

that all steps have equal size s. Let p be the probability of step in the +x direction

and q = 1−p be the probability of a step in −x direction. After a total of N steps,

how far will the particle have moved? If n represents the number of steps (out of

N total) taken in the +x direction, then the net distance traveled d is given by

d = ns− (N − n)s = (2n−N)s

6. STATISTICAL FLUCTUATIONS 21

and we can show that the probability of traveling this far is given by

(42) P(n) =N !

n! (N − n)!pn(1− p)N−n.

This result is known as the binomial distribution, since P(n) represents a typical

term in the binomial expansion

(43) (p+ q)N =N∑

n=0

N !n! (N − n)!

pnqN−n.

The derivation of (42) involves an understanding of the use of factorials in combi-

natorics and is left to the exercises.

Coin flipping, a more everyday example of a random system, leads to the same

probability distribution: If we let p represent the probability of obtaining the result

“heads” in a single coin flip, then the probability of obtaining n heads in N inde-

pendent coin flips is given by the binomial formula (42). We all have the intuitive

sense that a fair coin, that is, one with p = 0.5, lands heads half of the time, on

average. Many people believe in a “law of averages” operating here — fluctuations

in repeated trials tend to disappear after many trials; the fluctuations “average

out”. This popular intuition is responsible for many bad gambling decisions. It is

not true that after obtaining a long string of heads, additional flips are more likely

to land tails in order to even things out; a fair coin always has a 50-50 chance of

landing heads, regardless of the results of all previous flips. It is true however that,

of all the possible outcomes for a large number of coin flips, landing all heads is

unlikely and becomes more unlikely as the number of trials considered increases.

The precise statement of this result is known to mathematicians as the law of

large numbers:

In repeated, independent trials with the same probability p of

success in each trial, the probability that the percentage of suc-

cesses n/N differs from p by more than a fixed amount, ε > 0,

converges to zero as the number of trials N goes to infinity. This

holds for every positive ε.

It is important to note that significant deviations between the percentage af suc-

cesses and the chance of success p can occur. Furthermore, the law applies only

to the results of many trials considered together; it says nothing about short-term


fluctuations. The law of large numbers is quite useful in computations involving

a large number of random trials, where we may exploit the relation between the

percentage of success and the probability of success to great result.

The law of large numbers tells us to expect small fluctuations in a large system.

We’d like to know how small, however. Two of the most important quantities to

know about a probability distribution P (n) are the mean and variance of n. The

mean of n is the expectation value,

(44) 〈n〉 =∑

n

n P (n),

and the variance is given by

(45) V (n) = 〈n2〉 − 〈n〉2 =∑

n

n2 P (n)− 〈n〉2.

It is the variance that tells us about fluctuations. A standard element in introduc-

tory discussions of the random walk is a computation of the mean and variance

of the distance traveled d using the binomial distribution. It is not difficult to see

that, for the case p = 0.5,

〈d〉 = 0,(46)

〈d2〉 = N.(47)

The first result make sense: if the particle is equally likely to go left and right, then

on average it should get nowhere. The thr square root of the second result tells

how far away the particle typically wanders in N steps:√N . The random walk

is a remarkably general mathematical system and we can obtain the same results

using a very elegant approach, due to Feynman: Let the vector L represent a step

of length L taken in an arbitrary direction. After N − 1 steps the net displacement

can be represented by the vector RN−1. Thus,

RN = RN−1 + L

and

R 2N = RN ·RN = R 2

N−1 + 2RN−1 · L + L2.

Since each step is taken in a random direction, 〈RN−1 · L〉 = 0. It follows that

〈R 2N 〉 = 〈R 2

N−1〉+ 〈L2〉.


Therefore, by induction,

〈R 2N 〉 = N〈L2〉.

This argument works in an arbitrary number of dimensions and includes random

walks for which the step is not uniform. Given these results, we see that the root-

mean-square deviation — the square root of the variance — is proportional to√N . Note that as N increases, the random walk is capable of further and further

excursions from the starting point. However, the relative fluctuations — the

ratio of the root-mean-square deviation to N — tends to zero. (INTERPRET?) In

the exercises, we examine the consequences of this important result for fluctuations

in an ideal gas.

Another standard element in introductory discussions of the random walk is the

large N limit, where the binomial distribution turns into a Gaussian or normal

distribution,

(48) P (d) −→ 1√2πσ2

exp[− d2

2σ2

].

This result is in fact far deeper than the binomial distribution, or even the random

walk itself; it is an expression of one of the most important theorems in probability

theory, the central limit theorem:

Given a distribution with mean m and variance σ2, the sampling

distribution of the mean approaches a normal distribution with

mean m and variance σ2/N as N , the sample size, increases.

The meaning of this statement is as follows. Consider drawing a random sample

of size N from the given distribution and calculating the mean µ of this sample.

If we were able to repeat this experiment an infinite number of times, calculating

the mean of each sample drawn, we would obtain a distribution called the sampling

distribution of the mean. This distribution describes what the mean µ of a randomly

drawn sample of size N is likely to be. The central limit theorem, stated above,

claims that

(1) The mean of the sampling distribution of means equals the mean m of the

original distribution from which the samples are drawn.


(2) The variance of the sampling distribution of means equals the variance σ2

of the original distribution from which the samples are drawn, divided by

N .

(3) Regardless of the shape of the original distribution, the sampling distri-

bution of the mean approaches a normal distribution as the sample size N

increases (remember, N is the sample size and not the number of samples).

Thus, for the random walk considered earlier, the sample size is the number of

steps taken. (1) and (2) on the above list restate our results for the mean and

variance of a random walk. (3), however, is new and expresses the statement that

the binomial distribution looks more and more like a normal distribution as the

number of steps taken gets larger. For the one-dimensional random walk, we can

do this limit directly using Stirling’s approximation to convert factorial of large

numbers into exponentials; see the exercises below.

WAIT A MINUTE, THE VARIANCE OF A RANDOM WALK IS N NOT

1/N ...HAVE I SCREWED SOMETHING UP?

Exercise 1.10 (Combinatorics and the factorial function).

(1) Given a list of N distinct numbers, how many different permutations of

these numbers exist? (Answer: N ! = N(N − 1)(N − 2) · · · 1)

(2) Given a list of N numbers, N−m of which are distinct, how many different

permutations of these numbers exist? (Answer:N !m!

)

(3) Given a list of N distinct numbers, show that there are

N !(N −m)!

different ways of selecting m of these numbers?

(4) Given a list of N distinct numbers, show that there are

N !m! (N −m)!

different ways of selecting m of these numbers, when the order in which

they are selected does not matter? This combination of factorials is called

“N choose m”.

Exercise 1.11 (The binomial distribution and the random walk). Derive equa-

tion (42).


Solution. The problem breaks up into two parts: First, show that the prob-

ability of a particular sequence of steps — i.e. first two to the right, then one to

the left, and so on until N steps have been taken — is equal to pnqN−n, where n

is the total number of steps taken in +x direction and N − n is the total taken in

−x direction. The net distance traveled is d = 2n−N . Next, we have to consider

all the various sequences of steps which result in a final displacement of d. There

are N choose n of these sequences, adding a multiplicity factor which completes the

derivation.

Exercise 1.12 (Mean and variance of the binomial distribution). Derive equa-

tions (46) and (47).

Solution. First, note that

(49) (px+ q)N =N∑

n=0

(N

n

)(px)nqN−n = P(n)xn

and therefore

ddx

(px+ q)N =N∑

n=0

nP(n)xn−1.

The right-hand side of this expression, evaluated at x = 1, is the mean value of

n. Therefore, q = p − 1 implies that 〈n〉 = Np. A similar argument gives us

〈n2〉 = Np+N(N − 1)p2. When we express n in terms of d and substitute into the

definitions of mean and variance,

〈d〉 =∑

d

dP(d),

〈(d− 〈d〉)2〉 = 〈d2〉 − 〈d〉2,

the conclusion quickly follows. This slick derivation is due to Chandrasekhar (REF).

Exercise 1.13 (The Poisson distribution). Show that the Binomial distribu-

tion,

P(n) =(N

n

)pn(1− p)(N−n),

for the special case of small probabilities p 1 and infrequent events n N ,

reduces to

(50) P(n) =λn

n!exp−λ,


where λ = Np. (50) is known as the Poisson distribution. Show that this distribu-

tion is correctly normalized. Calculate the mean and variance of n for a Poisson

distribution.

—————————————————————————————————

TURN INTO AN EX:

or, in terms of N , p, and d alone,

(51) P(d) =(

N12 (N + d)

)p

12 (N+d)(1− p)

12 (N−d).

Let’s return to the binomial distribution for p = 0.5,

P(d) =(

N12 (N + d)

)1

2N.

The limit of this distribution for very large N and N d is of central importance.

Using Stirling’s approximation,

N ! w N logN −N,

we can write

log P(d) w N logN − 12(N +d) log

(12(N +d)

)− 1

2(N −d) log

(12(N −d)

)−N log 2.

Using d N , we can Taylor expand the logarithms,

log(12(N + d)

)= log

(N2

(1± d

N))

= logN

2± d

N− d2

2N.

With these approximations, (???) reduces to

log P(d) w−d2

2N.

Normalization →

P(d) w1

2πNexp

(−d2

2N),

a Gaussian distribution.

——————————————————————–

Exercise 1.14 (Fluctuations in an ideal gas). In equilibrium, we expect an ideal

gas to be evenly distributed throughout its volume. On the other hand, microscopic

fluctuations do occur. In this exercise, we demonstrate why large fluctuations from

uniformity in an ideal gas are extremely rare.

7. THE STATISTICAL BASIS FOR ENTROPY 27

(1) Consider a macroscopic ideal gas of N molecules confined to a volume V

and partition this volume ino two equal subvolumes, a ”left” and a ”right”.

What is the probability that there are m more molecules on the left than

on the right?

(2) Show that the probability distribution P(m) is Gaussian for m N . What

is the width of this Gaussian and what are the consequences for large

fluctuations?

Exercise 1.15 (Pressure in an ideal gas). In this exercise, we will rederive the

relationship between pressure and internal energy in an ideal gas. USE

P =〈F 〉A

= nm〈v 2x 〉,

(1) Let P(px) represent the probability that the x-component of a particular

molecule’s momentum equals px. Constant total energy E in the gas con-

fines the momentum vector p = pi to lie somewhere on a sphere of

radius R =√

2mE in momentum space...

(2) DERIVE GAUSSIAN FORM OF P(px). DON’T USE EQUIPARTI-

TION THM, INSTEAD LET β = 3N2E .

(3) Use this distribution to compute 〈v2〉, USE GAMMA FUNCTION...PLUGGING

IN FOR β restores result.

7. The Statistical Basis for Entropy

NEEDS WORK.

In order to connect the collective behavior of the system with the dynamics of its

constituent particles, statistical mechanics makes a distinction between microscopic

and macroscopic states of the system which makes precise the content of the atomic

hypothesis: A microscopic state or microstate of the system expresses the precise

state of every single particle in the system. Thus, classically, each point (q, p) in

phase space represents a distinct microstate of the system. If a quantum mechanical

viewpoint is taken, however, a complete specification of the quantum numbers for

each particle replaces the specification of the qi and pi. A macroscopic state or

macrostate of the system, on the other hand, expresses only the values of large-

scale properties such as total energy, total number of particles, and volume occupied

or local density. In a macrostate, all individual particle state information has been


averaged out; a macrostate encodes far less information. In this and the following

sections, we will discuss how the distinction between macroscopic and microscopic

states leads to a statistical basis for the thermodynamic concept of entropy.

The microscopic description is always constrained by the macroscopic descrip-

tion. Finite energy, for instance, restricts trajectories to remain in bounded regions

of phase phase. When total energy is conserved, trajectories are confined to a par-

ticular hypersurface. Microstates which are consistent with a given macrostate are

called accessible microstates. Let Ω represent the number of accessible microstates

associated with a particular macrostate. In the microcanonical ensemble, the ac-

cessible microstates are spread uniformly across an energy surface; Ω is therefore

in this case a function of energy alone Ω(E). Furthermore, the probability of the

system being in any one of the Ω(E) different accessible microstates is

p =1

Ω(E),

since these microstates are assumed to be occupied with equal probability. For a

classical system, Ω is simply the volume of the accessible region of phase space...

THIS IS GETTING TOO VERBOSE AND COMPLICATED. THE BASIC

IDEA IS THE SYSTEM IS MORE LIKELY TO BE IN MACROSTATES WITH

LARGER MULTIPLICITIES; ENTROPY IS JUST ANOTHER MEASURE OF

THIS (AN ADDITIVE ONE)...

...If, on the other hand, E is allowed to vary, than the assumption of equal

a priori probabilities implies that the system is more likely to be found in those

macrostates with greater multiplicity. This is the connection between thermody-

namic entropy S and multiplicity,

(52) S(E) = f(Ω(E)).

We can determine the form of this function f using only the assumption that

entropy is additive and the rules of combinatorics (ref Einstein): Consider two

non-interacting systems brought together. If the multiplicities for the two systems

are Ω1 and Ω2, respectively, then the multiplicity of the combined system is the

product,

(53) Ω = Ω1Ω2.

8. THERMODYNAMICS EQUILIBRIUM 29

By assumption, the entropy of the combined system is the sum of the entropies of

the subsystems (since they are non-interacting),

(54) S = S1 + S2.

Then, (52-54) together imply

f(Ω1Ω2) = f(Ω) = S = S1 + S2 = f(Ω1) + f(Ω2).

This condition requires that f be a logarithmic function of Ω,

(55) S = k log(Ω).

The constant k is determined by the choice of units. Historically, entropy was

measured in units of Joules per degree Kelvin. This sets the value of k at

kB = 1.381 · 10−23J/K.

kB is known as Boltzmann’s constant.

EXERCISES: A GOOD QUANTUM EXAMPLE, COUNTING MIXING, EN-

TROPY OF IDEAL GAS, ENTROPY OF HARD SPHERE GAS

8. Thermodynamics Equilibrium

MAY NEED TO DISCUSS EQUILIBRIUM HERE...SEE EARLIER VER-

SIONS

Consider again two subsystems brought into contact, except now allow energy

to flow between them. The total energy E of the combined system is equal to the

sum of the energies E1 and E2 of the two subsystems separately; note that E is

conserved if we assume that the total system is closed. Thus, the total entropy can

be written,

S(E) = S1(E1) + S2(E2) = S1(E1) + S2(E − E1).

The equilibrium condition that the entropy S be at a maximum requires

dSdE1

=dS1

dE1+

dS2

dE2

dE2

dE1

=dS1

dE1− dS2

dE2= 0.


In thermal equilbrium, no heat flows between the two subsystems and we say that

they are at the same temperature. The equality,

dS1

dE1=

dS2

dE2,

derived above, therefore suggests that we define temperature in terms of this de-

rivative. The following definition is the convention:

(56)1T

=dSdE

.

Thus, from the statistical mechanics viewpoint, temperature can be understood

as an energy cost for stealing entropy from a system. Note that, like entropy,

temperature is a purely statistical quantity which only has meaning for macroscopic

systems.

This viewpoint also sharpens our understanding of the approach to equilibrium.

Consider the same two subsystems in thermal contact, except assume that they are

not in equilibrium — i.e. their temperatures are not equal. Then the second law

of thermodynamics implies

ddtS(E) =

ddtS1(E1) +

ddtS2(E2)

=dS1

dE1

dE1

dt+

dS2

dE2

dE2

dt> 0.

However, the total energy is conserved: E2 = E −E1, where E is constant. There-

fore,

dSdt

=(

dS1

dE1− dS2

dE2

)dE1

dt

=(

1T1

− 1T2

)dE1

dt> 0.

If the temperature of the second subsystem is great than that of the first, T2 > T1,

then E1 must increase monotonically time in order to satisfy the above inequality.

Likewise, if T1 > T2, then E1 must decrease with time. Energy flows from regions

of higher temperature to regions of lower temperature, until thermal equilibrium is

reached.

In (or near) equilibrium, the macrostate of a system is determined by the total

energy E, the total number of particles N , and the volume V ; all other macroscopic

quantities can be written as functions of these variables. Thus given an function

of state, S(E, V,N), we can produce other function of state such as E(S, V,N)

9. DOES ENTROPY INCREASE? 31

by inverting the variables. We have seen that temperature can be defined as a

derivative of this function; applying the inverse function theorem, we may rewrite

(56) as

(57) T =(∂E

∂S

)N,V

The partial derivative in (57) reminds us that S depends on several thermodynamic

quantities; thus there are two other analogous expressions which might be derived

in the same way. Just asdE1

dS1=

dE2

dS2

holds for our two systems in equilbrium, so too must the expressions

dE1

dV1=

dE2

dV2(58)

dE1

dN1=

dE2

dN2(59)

hold as well. (58) is used to define pressure P and (59) is used to define chemical

potential µ:

(60) P = −(∂E

∂V

)S,N

; µ =(∂E

∂N

)S,V

.

MINUS SIGNS? Here, as with temperature, the second law of thermodynamics

implies that the volumes of the two systems change in a well-defined way, with one

system expanding and forcing the other to contract so as to equalize the pressures.

From this perspective, pressure can be understood as defining an energy cost for

decreasing the volume. The condition of equal pressures (58) is called mechanical

equilibrium; in general a non-equilibrium system settles in mechanical equilibrium

much faster than thermal equilibrium (REF). Similarly, particle flow between the

two systems (if possible) is defined by the difference in chemical potentials and µ

can be understood as defining an energy cost for stealing particles from a system.

The condition (59) is called chemical equilibrium. We will have much more to

say about the chemical potential later on.

THERMO EXERCISES

9. Does Entropy Increase?

PARADOX OF HAVING REVERSIBLE MICROSCOPIC LAWS AND IRRE-

VERSIBLE MACROSCOPIC LAWS...


10. Information Theory

References

R. Bowley and M. Sanchez, Introductory Statistical Mechanics 2nd ed., Claren-

don Press, Oxford (2001).

S. Chandrasekhar, ”Stochastic Problems in Physics and Astronomy”, Reviews

of Modern Physics, Vol. 15, No. 1, pp. 1-89 (1943).

C. Kittel and H. Kroemer, Thermal Physics 2nd ed., W. H. Freeman and Co.,

New York (1995).

Landau and Lifschitz

Onsager

Pathria

Reif

statistical mechanics - cornell...

Documents