statistical mechanics - cornell...
TRANSCRIPT
Statistical Mechanics
Contents
Chapter 1. Ergodicity and the Microcanonical Ensemble 1
1. From Hamiltonian Mechanics to Statistical Mechanics 1
2. Two Theorems From Dynamical Systems Theory 6
3. Ergodicity and the Microcanonical Ensemble 9
4. Quantum Mechanics and Density Matrices 12
5. The Validity of the Ergodic Hypothesis 18
6. Statistical Fluctuations 20
7. The Statistical Basis for Entropy 27
8. Thermodynamics Equilibrium 29
9. Does Entropy Increase? 31
10. Information Theory 32
References 32
3
CHAPTER 1
Ergodicity and the Microcanonical Ensemble
The pressure in an ideal gas, recall, is proportional to the average kinetic energy
per molecule. Since pressure may be understood as an average over billions upon
billions of microscopic collisions, this simple relationship illustrates how statistical
techniques may be used to supress information about what each individual molecule
is doing in order to extract information about what the molecules do on average as
a whole. Our first task, as we examine the foundations of statistical mechanics, is to
understand more precisely why this suppression is necessary and how exactly it is
to be accomplished with more precision. We must, therefore, begin by considering
the laws of microscopic dynamics. In physics, there are two choices here — the laws
of classical mechanics and the laws of quantum mechanics. Remarkably, the choice
is not important; in either case, detailed solutions to the dynamical equations are
completely unnecessary. We will consider both cases, but follow the classical route
through Hamiltonian mechanics first, as this provides the clearest introduction to
the structure of statistical mechanics. In this section, we will review the essential
elements of Hamiltonian mechanics and discuss the need for and basic elements of
a probabilitistic framework...
1. From Hamiltonian Mechanics to Statistical Mechanics
Newton’s second law for a particle of mass m,
Ftotal = mq,
is a second-order ordinary differential equation. Therefore, given the instantaneous
values of the particle’s position q and momentum p = mq at some time t = 0, the
particle’s subsequent motion is uniquely determined for all t > 0. For this reason,
the state of a classical system consisting of n configurational degrees of freedom can
be thought of as a point (q1, . . . , qn, p1, . . . , pn) in a 2n-dimensional space called the
1
2 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
phase space of the system. As the state evolves in time, this point will trace out
in phase space a trajectory defined by the tangent vector,
(1) v(t) = (q1(t), . . . , qn(t), p1(t), . . . , pn(t)) .
A Hamiltonian system evolves according the canonical equations of motion,
qi =∂
∂piH(q,p, t),(2)
pi = − ∂
∂qiH(q,p, t),(3)
where the function
H(q,p, t) = H(q1, . . . , qn, p1, . . . , pn, t)
is called the Hamiltonian of the system. These equations represent the full content
of Newtonian mechanics. Note that exactly one trajectory passes through each
point in the phase space; the classical picture is completely deterministic.
Example (single particle dynamics). Find the canonical equations of motion
for a single particle of mass m in an external potential V (q).
Solution. The Hamiltonian for this system is simply
H(q, p) =p2
2m+ V (q),
which we recognize as the sum of the kinetic and potential energies. This leads to
the following dynamical equations:
q =p
m
p = − ∂
∂qV (q).
A system of many interacting particles has a similar solution, though the potential
term becomes much more complicated.
We see, therefore, that the first canonical equation (2) generalizes the relationship
between velocity and momenta (in a more complicated system, the i-th momentum
may depend on several of the qi and qi). Similarly, the second canonical equation
(3) generalizes the rule that force may be expressed as a gradient of an energy
function.
1. FROM HAMILTONIAN MECHANICS TO STATISTICAL MECHANICS 3
In a Hamiltonian system, the time dependence of any function of the momenta
and coordinates
f = f(q1, . . . , qn, p1, . . . , pn, t)
can be written,
(4)dfdt
=f,H
+∂f
∂t,
wheref,H
is the Poisson bracket of the function f and the Hamiltonian.
The Poisson bracket of two functions f1 and f2 with respect to a set of canonical
variables is defined as
(5)f1, f2
=
n∑j=1
(∂f1∂qj
∂f2∂pj
− ∂f1∂pj
∂f2∂qj
).
The Poisson bracket is important in Hamiltonian dynamics because it is indepen-
dent of how the various coordinates and momenta are defined; that is,u, v
takes
the same value for any set of canonical variables q and p. Furthermore, the canon-
ical equations of motion can be re-written in the following form,
qi =qi,H
,(6)
pi =pi,H
.(7)
This is known as the Poisson bracket formulation of classical mechanics. It is impor-
tant to recognize that very similar expressions arise in quantum mechanics (we’ll
look at these in Section 4). Indeed, every classical expression involving Poisson
brackets has a quantum analogue employing commutators. This elegant correspon-
dence principle, first pointed out by Dirac, has deep significance for the relationship
between classical and quantum physics. It also provides our first glimpse of why
statistical mechanics transcends the details of the microscopic equations of motion.
For now, we return to the classical route into the heart of statistical mechanics...
Examining a physical system from the classical mechanical point of view, one
first constructs the canonical equations of motion and then integrates these from
known initial conditions to determine the phase trajectory. If the system of in-
terest involves a macroscopic number of particles, this approach condemns one to
numerical computations involving matrices of bewildering size. Yet system size is
not the major obstacle: The canonical equations of motion are in general nonlin-
ear and, as a result, small changes in system parameters or initial conditions may
4 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
lead to large changes in system behavior. In particular, neighboring trajectories in
many nonlinear systems diverge from one another at an exponential rate, a phe-
nomenon known as sensitive dependence on initial conditions or, more popularly,
as the butterfly effect, the idea being that a flap of a butterfly’s wings may make
the difference between sunny skies and snow two weeks later. Systems exhibiting
sensitive dependence on initial conditions are said to be chaotic. Calculations of
chaotic trajectories are intolerant of even infinitesimal errors, such as those aris-
ing from finite precision and uncertainties in the state of the system. Therefore,
setting aside the impractical integration problem of calculating a high-dimensional
phase trajectory, our necessarily incomplete knowledge of initial conditions in a
macroscopic system seriously compromises our ability to predict future evolution.
Though the prospects for dealing directly with the phase trajectories of a macro-
scopic system of particles seem hopeless, it is not the case that we must discard
all knowledge of the microscopic physics of the system. There are many macro-
scopic phenomena which cannot be understood from a purely macroscopic point
of view. What is combustion? What determines whether a solid will be a metal
or an insulator? What are the energy sources in stellar and galactic cores? These
questions are best dealt with by appealing to various microscopic details. On the
other hand, given the success of the laws of thermodynamics, it is evident that
macroscopic systems exhibit a collective regularity where the exact details of each
particle’s motion and state are nonessential. This suggests that we may envision
the time evolution of macroscopic quantities in a Hamiltonian system as some sort
of average over all of the microscopic states consistent with available macroscopic
knowledge and constraints. For this reason, one abandons the mechanical approach
of computing the exact time evolution from a single point in phase space in favor
of a statistical approach employing averages over an entire ensemble of points in
phase space. This is accomplished as follows:
Consider a large collection of identical copies of the system, distributed in phase
space according to a known distribution function,
ρ(q,p, t) = ρ(q1, . . . , q3N , p1, . . . , p3N , t),
1. FROM HAMILTONIAN MECHANICS TO STATISTICAL MECHANICS 5
where
(8)∫ρ(q,p, t) dq dp = 1 for all t.
ρ(q,p, t) is the density in phase space of the points representing the ensemble,
normalized according to (8), and may be interpreted as describing the probability
of finding the system in various different microscopic states. Once ρ(q,p, t) is
specified, we can compute the probabilities of different values of any quantity f
which is a function of the canonical variables. We can also compute the mean value
〈f〉 of any such function f by averaging over the probabilities of different values,
(9) 〈f(t)〉 =∫f(q,p) ρ(q,p, t) dp dq.
Thus, instead of following the time evolution of a single system through many
different microscopic states, we consider at a single time an ensemble of copies of
the system distributed into these states according to probability of occupancy. This
shift is one of the cornerstones of statistical mechanics.
Exercise 1.1. Derive equation (4). HINT: Use the chain rule
dfdt
=∑
i
∂f
∂qi
∂qi∂t
+∑
i
∂f
∂pi
∂pi
∂t+∂f
∂t
Exercise 1.2. Show that H(q,p, t) is a constant of the motion if and only if
it does not depend explicitly on time.
Exercise 1.3. Show that the canonical equations of motion can be re-written
in the following form,
qi =qi,H
,(10)
pi =pi,H
.(11)
This is known as the Poisson bracket formulation of classical mechanics.
Exercise 1.4. Compute the following Poisson brackets:
(1)qi, qj
(2)
qi, pj
Are your results in any way familiar, given your knowledge of quantum mechan-
ics? If so, how do the interpretations of these results differ from their quantum
mechanical analogues?
6 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
Exercise 1.5. Show that the canonical equations of motion can be written in
the symplectic form,
x = M∂
∂xH(q,p, t),
where x = [q,p] (what’s M in this expression?)
2. Two Theorems From Dynamical Systems Theory
One is often interested in general qualitative questions about a system’s dy-
namics, such as the existence of stable equilibria or oscillations. In discussing such
questions, mathematicians often speak of the flow of a dynamical system: Any
autonomous system of ordinary differential equations can be written in the form
(12) x = f(x)
(changes of variables may be required if the equations involve second-order and
higher derivatives). If we interpret a general system of differential equations (12)
as representing a fluid in which the fluid velocity at each point x is given by the
vector f(x), then we may envision any particular point x0 as flowing along the
trajectory φ(x0) defined by the velocity field. More precisely, we define
φt(x0) = φ(x0, t),
where φ(x0, t) is a point on the trajectory φ(x0) passing through the initial condition
x0; φt maps the starting point x0 to its location after moving with the flow for a
time t. It is important to note that φt defines a map on the entire phase space —
we may envision the entire phase space flowing according the velocity field defined
by (12). Indeed, we shall see in this section that this fluid metaphor is especially
appropriate in statistical mechanics.
The notion of the flow of a dynamical system very naturally accomodates a shift
towards considering how whoIe regions of phase space participate in the dynamics,
a shift away from the language of initial conditions and trajectories. This shift is
what enables mathematicians to state and prove general theorems about dynamical
systems. It also turns out that this shift provides the natural setting for several of
the central concepts of statistical mechanics. In the previous section, we motivated
a statistical framework in which, rather than follow the time evolution of a single
system, we consider at a single time an ensemble of copies of that system distributed
2. TWO THEOREMS FROM DYNAMICAL SYSTEMS THEORY 7
in phase space according to probability of occupancy. The main player in this new
framework is the distribution function ρ(q,p, t) describing the ensemble. ρ allows
us to take into account take into account which states in phase space a system is
likely to occupy1. In this section, we examine how the ensemble interacts with the
flow defined by a set of canonical equations. It turns out that, in a Hamiltonian
system, the time evolution of ρ has several interesting properties, which are the
subject of two important theorems from dynamical systems theory.
We begin with a simple calculation of the rate of change of ρ. We know from
(4), which describes the time evolution of any function of the canonical variables q
and p, that
(13)dρdt
=∂ρ
∂t+ρ,H
.
However, we also know from local conservation of probability that ρ must satisfy a
continuity equation,
(14)∂ρ
∂t+ ∇ · (ρv) = 0,
where
∇ =(
∂
∂q1, . . . ,
∂
∂qn,∂
∂p1, . . . ,
∂
∂pn
)is the gradient operator in phase space and v is defined in (1). Applying the chain
rule, we see that
(15) ∇ · (ρv) =ρ,H
+ ρ(∇ · v).
Since the ∇·v = 0 vanishes for a Hamiltonian system, (13) and (14) are equal and
therefore
(16)∂ρ
∂t+ρ,H
= 0.
and
(17)dρdt
= 0.
This result is known as Liouville’s theorem. The partial derivative term in (16)
expresses the change in ρ due to elapsed time dt, while the (∇ρ) · v =ρ,H
term expresses the change in ρ due to motion along the vector field a distance vdt.
1Mathematicians include this as part of a more general approach, called measurable dynamics,
which we need not go into here.
8 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
Thus, Liouville’s theorem tells us that the local probability density — as seen by
an observer moving with the flow in phase space — is constant in time; that is, ρ is
constant along phase trajectories. The theorem can also be interpreted as stating
that, in a Hamiltonian system, phase space volumes are conserved by the flow or,
equivalently, that ρ moves in phase space like an incompressible fluid.
From the incompressible fluid analogy, we see that while Hamiltonian systems
can exhibit chaotic dynamics, they cannot have any attractors! Liouville’s theorem
has other important consequences when combined with system constraints, such
as conservation laws. Conservation laws constrain the flow to lie on families of
hypersurfaces in phase space. These surfaces are bounded and invariant under the
flow:
φt(S) = S
for each hypersurface S defined by a conservation law. When volume-preserving
flows are restricted to bounded, invariant regions of phase space, a surprising result
emerges: Let X be a bounded region of phase space which is invariant under a
volume-preserving flow. Take any region S which occupies a finite fraction of the
total volume in X (this specifically excludes what mathematicians call sets of mea-
sure zero: sets with no volume). Then any randomly selected initial condition x in
S generates a trajectory φt(x) which returns to S infinitely often — this is known
as the Poincare recurrence theorem.
In order to understand where this theorem comes from and what it means, we
consider how the region S moves under the flow. Define a function f which maps
S along the flow for a time T ,
f(S) = φT (S).
Subsequent iterations of this time-T map produce a sequence of subsets of X,
f2(S) = φ2T (S), f3(S) = φ3T (S), and so on, all with finite volume in X. Each
iteration takes a bite out ofX and so, if we iterate enough times, eventually we must
exhaust all of the volume in X. As result, two of these subsets must intersect; i.e.
there must exist integers i and j, with i > j, such that f i(S)∩ f j(S) is non-empty.
This implies that f i−j(S)∩S is also non-empty. S must fold back on itself repeatedly
under this time-T flow map. By considering small subsets of S, which must also have
this property, we can convince ourselves that a randomly selected point in S does
3. ERGODICITY AND THE MICROCANONICAL ENSEMBLE 9
indeed return to S infinitely often (for a precise proof of the theorem, see references
at end of chapter). The Poincare recurrence theorem as stated implies that almost
every initial condition x0 in the bounded region X generates a trajectory which
returns arbitrarily close to x0 infinitely many times. This recurrence property is
truly remarkable when you consider the bewildering array of nonlinear Hamiltonian
systems to which it may be applied. Indeed, the Poincare recurrence theorem is
considered the first great theorem of modern dynamics; we will have more to say
about its role in statistical mechanics later on.
3. Ergodicity and the Microcanonical Ensemble
Liouville’s Theorem has profound consequences for a system in equilibrium.
An ensemble is said to be stationary if the probability density does not depend
explicitly on time,
(18)∂
∂tρ(q, p, t) = 0.
This restriction guarantees that all ensemble averages will be time-independent;
we therefore expect that systems in equilibrium can be represented by stationary
ensembles. Note that an stationary ensemble satisfying Liouville’s Theorem (17)
has a vanishing Poisson Bracket with the Hamiltonian,
(19)ρ(q, p),H(q, p)
= 0.
Sinceqi, pj
= δij (where δij is the Kronecker delta function), no function of q
or p alone will satisfy (19). The general solution for a stationary ensemble has the
form
ρ(q, p) = ρ(H(q, p)).
The simplest example of a stationary ensemble is the microcanonical ensem-
ble, for which the distribution ρ(q, p) is at all times uniformly distributed over all
accessible microstates defined by constant energy. The assumption that the micro-
canonical ensemble is valid is one of the cornerstones of statistical mechanics and
is known as the postulate of equal a priori probabilities.
—————————————————————————————————-
10 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
Birkhoff pointwise ergodic theorem: For almost all x,
limn→∞
1T
∫ ∞
0
f(φt(x)) dt = f∗(x)
The important statement here is that functions of dynamical variables, when aver-
aged along trajectories, converge almost everywhere to something. The limit may
depend on x, which is why f∗ is written above as a function of x and is why math-
ematicians call this a “pointwise” theorem, but the limit almost always exists. (We
have to say almost always b/c...).
f∗(φt(x) = f∗(x)
(f∗ is invariant under the flow) and∫f∗ρdx =
∫fρdx
(ensemble average of f equals ensemble average of limit of time average of f, somehow
the time averaging doesn’t affect the ensemble average)
Definition: Ergodic means φt(S) = S if and only if m(S) = 0 or 1 (The only
invariant sets are those with volume equal to the that of the entire space and
those with zero volume) Loosely put, this means that almost every trajectory wan-
ders almost everywhere (on its energy surface). REF BACK TO RECURRENCE
AND NOTE HOW MUCH STRONGER THIS IS AND THAT MANY PHYSICS
TEXTBOOKS SEEM TO BE CONFUSED ABOUT THIS DISTINCTION
One more theorem, stated w/o proof: Ergodic if and only if for any f , f(φt(x) =
f(x) mean f is constant (i.e. the only invariant functions are constants)
This last theorem leads us to the microcanonical ensemble. Since ρ is invariant
under the flow, by Liouville’s thm, ergodicity means ρ is a constant (on energy
surface) — I’M GETTING TIRED OF HAVING TO MENTION THE ENERGY
SURFACE ALL THE TIME. CAN I PREEMPT THIS EARLIER? — So the as-
sumption of a priori equal probabilites is really just the assumption that the system
is ergodic; this foray into dynamics helps us clarify what we’re assuming!
Now back to a general f for an ergodic system. We know f∗ must be a constant
(almost everywhere). We can actually compute this constant by integrating over
the ensemble: ∫f∗ρ = f∗
∫ρ = f∗
3. ERGODICITY AND THE MICROCANONICAL ENSEMBLE 11
But, by (), this is just the ensemble average of f, GIVES US TIME AVR EQUALS
ENSEMBLE AVR...THIS IS WHAT WE ARE REALLY WANTING TO DO IN
STAT MECH.
Use this to justify constant ρ as follows...INDICATOR FUNCTION (LEADS
TO TRAJECTORIES SPEND ON AVR THE SAME AMOUNT OF TIME EV-
ERWHERE) – POSSIBLE EX?
—————————————————————————————————-
For a stationary ensemble, the ensemble average f (9) is time-independent.
Thus the time average of f equals f ,
(20) f =1T
∫ T
0
f dt.
Switching the order of integration, it follows that f equals the ensemble average of
the time-average of f(q, p),
(21) f =∫ (
1T
∫ T
0
f(p, q) dt
)ρ(p, q) dp dq.
If the averging time T is short, the right-hand side of this equation clearly depends
on the particular microscopic states occupied. However, for a microcanonical en-
semble and a sufficiently long time T , the average will turn out to depend only
on the macroscopic constraints. The reason for this is that a probability density
spread uniformly throughout the accessible region of phase space implies that a
phase trajectory confined to this region wander uniformly as well; that is, given
sufficient time, the trajectory enters any neighborhood of every point in the region
— mathematicians call this the ergodic theorem. Therefore, for time T sufficiently
large, the time-average of f(p, q) is the same for every member of the ensemble.
From this follows the major result, glimpsed earlier, that long-time averages equal
ensemble averages,
(22)1T
∫ T
0
f(p, q) dt =∫f(p, q) ρ(p, q) dp dq,
independent of initial and final states.
We can clarify the interplay of these statistical ideas in the following summary,
due to Landau and Lifshitz: The ergodic theorem states that after a sufficiently long
time the system’s phase trajectory will return to any neighborhood of an allowed
point in phase space. Let ∆T represent the small part of the total time T that the
12 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
trajectory spends in a small phase volume ∆q∆p. If we combine ergodicity with the
microcanonical ensemble, then this trajectory spends on average the same amount
of time everywhere —
(23) limT→∞
∆TT
= w,
where w is some fixed proportion. It follows from (23) and the definition of the
density ρ that
(24) dw = ρ(q, p) dq dp.
Combining (9) and (24),
(25) ensemble average =∫f(p, q) ρ(p, q, t) dp dq
=∫f(p, q) dw = lim
T→∞
1T
∫f(p, q) dt = time average,
we see that, indeed, statistical averaging over the ensemble at fixed time is equivalent
to time-averaging a single member of the ensemble. This is what allowed the loose
use of averaging in our discussion of pressure in an ideal gas to work; this is what
allowed us to ignore the time evolution and only consider what a typical gas molecule
was doing on average. Furthermore, to the extent that all measurements in the
lab are time averages, ergodicity and the microcanonical ensemble firmly ground
macroscopic measurements in the microscopic statistical dynamics of the system
being investigated.
4. Quantum Mechanics and Density Matrices
In classical physics, the state of a system at some fixed time t is uniquely
defined by specifying the values of all of the generalized coordinates qi(t) and mo-
menta pi(t). In quantum mechanics, however, the Heisenberg uncertainty principle
prohibits simultaneous measurements of position and momentum to arbitrary pre-
cision. We might therefore anticipate some revisions in our approach. It turns
out, however, that the classical ensemble theory developed above carries over into
quantum mechanics with hardly revision at all. Most of the necessary alterations
are built directly into the edifice of quantum mechanics and all we need is to find
suitable quantum mechanical replacements for the density function ρ(q, p) and Li-
ouville’s Theorem. Understanding this is the goal of this section. Readers who are
4. QUANTUM MECHANICS AND DENSITY MATRICES 13
unfamiliar with Dirac notation and the basic concepts of quantum mechanics are
referred to the references at the end of the chapter.
The uncertainty principle renders the concept of phase space meaningless in
quantum mechanics. The quantum state of a physical system is instead repre-
sented by a state vector, |ψ〉, belonging to an abstract vector space called the
state space of the system. The use of an abstract vector space stems from the
important role that superposition of states plays in quantum mechanics — lin-
ear combinations of states provide new states and, conversely, quantum states can
always be decomposed into linear combinations of other states. The connection
between these abstract vectors and experimental results is supplied by the formal-
ism of linear algebra, by operators and their eigenvalues. Dynamical variables,
such as position and energy, are represented by self-adjoint linear operators on the
state space and the result of any measurement made on the system is always rep-
resented by the eigenvalues of the appropriate operator (that is, the eigenvectors
of an observable physical quantity form a basis for the entire state space). This
use of operators and eigenvalues directly encodes many of the distinct hallmarks of
quantum mechanical systems: Discretization, such as that of angular momentum or
energy observed in observed in numerous experiments, simply points to an operator
with a discrete spectrum of eigenvalues. And wherever the order in which several
different measurements are made may affect the results obtained, the associated
quantum operators do not commute.
In quantum mechanics, the time evolution of the state vector is described by
Schrodinger’s equation,
(26) i~∂
∂t|ψ(t)〉 = H(t) |ψ(t)〉,
where H(t) is the Hamiltonian operator for the system; this evolution law replaces
the canonical equations of classical mechanics.
Exercise 1.6 (single particle dynamics). Write down, using wavefunctions
ψ(q, t), Schrodinger’s equation for a single particle of mass m in an external po-
tential V (q).
Solution. Recall, that the classical Hamiltonian for this system is simply
H(q, p) =p2
2m+ V (q).
14 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
We transform this into a quantum operator by replacing q and p with the appropriate
quantum operators: q is the position operator and
p =~i
∂
∂q
is the momentum operator for a wavefunction ψ(q, t). Then, Schrodinger’s equation
(26) becomes the following partial differential equation,
(27) i~∂
∂tψ(q, t) =
(− ~2
2m∇2 + V (q)
)ψ(q, t).
Schrodinger’s equation has a number of nice properties. First, as a linear
equation, it directly expresses the principle of superposition built into the vector
structure of the state space — linear combinations of solutions to (26) provide new
solutions. In addition, it can be shown that the norm of a state vector 〈ψ|ψ(t)〉
is invariant in time; this turns out to have a nice interpretation in terms of local
conservation of probability. On the other hand, Schrodinger equation is not easy
to solve directly. Even a system as simple as the one-dimensional harmonic os-
cillator requires great dexterity. For a macroscopic system, (26) generates either
an enormous eigenvalue problem or a high-dimensional partial differential equation
(consider the generalization of (27) to a many-body system). Either way, we see
that direct solution is hopeless. The situation is essentially identical with that of
macroscopic classical mechanics — the mathematics and, more importantly, our
lack of information about the microscopic state (quantum numbers, in this case)
necessitate a statistical approach.
We would like to find a quantum mechanical entity that replaces the classical
probability density ρ(q,p), which uses probabilities to represent our ignorance of
the true state of the system. Unfortunately, the usual interpretation of quantum
mechanics already employs probabilities on a deeper level: If the measurement of
some physical quantity A in this system is made a large number of times (i.e. on
a large ensemble of identically prepared systems), the average of all the results
obtained is given by the expectation value
(28) 〈A〉 = 〈ψ|A|ψ〉,
provided the quantum state |ψ(t)〉 is properly normalized to satisfy 〈ψ|ψ(t)〉 = 1.
In order to understand the consequences of this, we introduce a basis of eigenstates
4. QUANTUM MECHANICS AND DENSITY MATRICES 15
for the operator A. Let |ai〉 be the eigenvector corresponding to the eigenvalue ai.
Since the |ai〉 form a basis, we can expand the identity operator as follows,
(29) 1 =∑
i
|ai〉〈ai|.
Inserting this operator into (28) twice, we obtain
(30) 〈A〉 =∑
i
ai
∣∣〈ai|ψ〉∣∣2.
Comparing this result to the definition of the expectation value,
(31) 〈A〉 =∑
i
ai p(ai),
we see that∣∣〈ai|ψ〉
∣∣2 must be interpreted as represented the probability p(ai) of ob-
taining ai as the result of the measurement. This probabilistic framework replaces
the classical notion of a dynamical variable having a definite value. While the ex-
pectation value of A is a definite quantity, particular measurements are indefinite
— in quantum mechanics we can only talk about the probabilities of different out-
comes of an experiment. Now we can introduce an ensemble. Instead of considering
a single state |ψ〉, let pk represent the probability of the system being in a quantum
state represented by the normalized state vector |ψk〉. If the system is actually in
state |ψk〉, then the probability of measuring ai is simply∣∣〈ai|ψk〉
∣∣2. If, however,
we are uncertain about the true state then we have to average over the ensemble.
In this case, the total probability of measuring ai is given by
(32) p(ai) =∑
k
pk
∣∣〈ai|ψk〉∣∣2 = 〈ai|
(∑k
|ψk〉pk〈ψk|)|ai〉.
The object in parentheses in this last expression,
(33) ρ =∑
k
|ψk〉pk〈ψk|,
is known as the density operator. (33) turns out to be exactly what we’re look-
ing for, the quantum mechanical operator corresponding to the classical density
function ρ(q, p). Recall, that the classical density satisfies the following properties:
(1) Non-negativity of probabilities: ρ(q, p) must be non-negative for all points
in the phase space.
16 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
(2) Normalization of probabilities:∫ρ(q, p) dq dp = 1.
(3) Expectation values: The average value of a dynamical variable A(p, q)
across the entire ensemble represented by ρ(q, p) is given by
〈A〉 =∫A(q, p)ρ(q, p) dq dp.
These properties carry over into the quantum mechanical setting, with appropriate
modification (see exercises). In particular, it can be shown that
〈A〉 = traceAρ.
Apart from traces over a density operator replacing integration over the classical
ensemble, the statistical description of a complex quantum system is essentially no
different than that of a complex classical system. The time evolution of the density
operator ρ will be given by a quantum version of Liouville’s Theorem and will lead
to the same notions of a microcanonical ensemble and ergodicity.
First, we derive the quantum evolution law for ρ. Using the chain rule, we can
write
(34) i~∂ρ
∂t=∑
k
i~[( ∂∂t|ρ〉)pk〈ρ|+ |ρ〉
)pk
( ∂∂t〈ρ|)].
Substituting the Schrodinger equation, this reduces to
(35) i~∂ρ
∂t=∑
k
[(H|ρ〉
)pk〈ρ|+ |ρ〉
)pk
(H〈ρ|
)]= Hρ− ρH.
Thus,
(36)∂ρ
∂t= − 1
i~[ρ,H],
where [ρ,H] = ρH −Hρ is called the commutator of ρ and H. Note the striking
resemblance between (36) and Liouville’s Theorem — the commutator of the density
and Hamiltion operators has replaced the classical Poisson bracket of the density
and Hamiltonian functions but the expressions are otherwise identical. This is a
special case of a correspondence first pointed out by Dirac:
classical Poisson bracket,u, v
−→ quantum commutator,1i~[u, v].
4. QUANTUM MECHANICS AND DENSITY MATRICES 17
As in the classical setting, a stationary ρ should be independent of time; for an
equilibrium quantum system, ρ must therefore be a function of the Hamiltonian,
ρ(H). The simplest choice is again a uniform distribution,
(37) ρ =∑
k
|ψk〉1n〈ψk|,
where n is the number of states |ψk〉 in the ensemble. This the quantum micro-
canonical ensemble. It is essentially the same as the classical one, except discrete.
...THE SAME STATISTICAL PRINCIPLES APPLY, WE JUST HAVE TO
SWITCH TO A DISCRETIZED FORMALISM (TRACES OVER OPERATORS
INSTEAD OF...)
Exercise 1.7. Show that the eigenvalues of the density operator are non-
negative.
Solution. Let ρ′ represent any eigenvalue of ρ and let |ρ′〉 be the eigenvector
associated with this eigenvalue. Then∑k
|ψk〉pk〈ψk|ρ′〉 = ρ|ρ′〉 = ρ′|ρ′〉
Multiplying on the left by 〈ρ′|, we obtain∑k
pk
∣∣〈ψk|ρ′〉∣∣2 = ρ′〈ρ′|ρ′〉.
It follows that, since the pk are positive and 〈ρ′|ρ′〉 is non-negative, ρ′ cannot be
negative. Since eigenvalues in the quantum setting represent measurements in the
classical setting, this result mirrors property (1) above.
Exercise 1.8. Show that the matrix representation of ρ in any basis satisfies
(38) traceρ
= 1.
Solution. Consider a basis of eigenstates |ai〉 of the operator A. The matrix
elements ρij = 〈ai|ρ|aj〉 are the representation of ρ in this basis. Then,
traceρ
=∑
i
〈ai|ρ|ai〉 =∑
i
∑k
pk
∣∣〈ψk|ai〉∣∣2
=∑
k
pk
(∑i
∣∣〈ψk|ai〉∣∣2) =
∑k
pk = 1
18 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
Since the trace is invariant under a change of basis, this result holds for any basis.
The condition traceρ
= 1 should be compared to the normalization property (2)
above.
Exercise 1.9. Show that, in a quantum ensemble represented by the operator
ρ, the expectation value of an operator A satisfies
(39) 〈A〉 = traceAρ.
Solution.
〈A〉 =∑
k
pk〈ψk|A|ψk〉 =∑k,i
pk〈ψk|ai〉〈ai|A|ψk〉
=∑i,k
〈ai|A|ψk〉pk〈ψk|ai〉 =∑i,k
〈ai|Aρ|ai〉 = traceAρ.
This result should be compared to the classical definition of expectation value, prop-
erty (3) above.
5. The Validity of the Ergodic Hypothesis
One important feature of Hamiltonian dynamics is the equal status given to
coordinates and momenta as independent variables, as this allows for a great deal of
freedom in selecting which quantities to designate as coordinates and momenta (the
qi and pi are often called generalized coordinates and momenta). Any set of vari-
ables which satisfy the canonical equations (2-3) are called canonical variables.
One may transform between different sets of canonical variables; these changes of
variables are called canonical transformations. Note that while the form of the
Hamiltonian depends on how the chosen set of canonical variables are defined, the
form of the canonical equations are by definition invariant under canonical trans-
formations. . .
Hamiltonian systems have a great deal of additional structure. The quantity,
(40)∮
γ
p · dq =n∑
i=1
∮γ
pi dqi,
known as Poincare’s integral invariant, is independent of time if the evolution of
the closed path γ follows the flow in phase space. The left-hand side of (40) is also
known as the symplectic area. This result can be generalized if we extend our phase
space by adding a dimension for the time t. Let Γ1 be a closed curve in phase space
5. THE VALIDITY OF THE ERGODIC HYPOTHESIS 19
(at fixed time) and consider the tube of trajectories in the extended phase space
passing through points on Γ1. If Γ2 is another closed curve in phase space enclosing
the same tube of trajectories, then
(41)∮
Γ1
(p · dq − H dt) =∮
Γ2
(p · dq − H dt).
This result that the integral ∮(p · dq − H dt)
takes the same value any two paths around the same tube of trajectories is called
the Poincare-Cartan integral theorem. Note, if both paths are taken at fixed time,
then (41) simply reduces to (40).
Structure of this sort, as well as the presence of additional invariant quantities,
greatly constrains the flow in phase space and one may wonder whether this struc-
ture is compatible with the ergodic hypothesis and the microcanonical ensemble.
The most extreme illustration of the conflict is the special case of integrable Hamil-
tonian systems. A time-independent Hamiltonian system is said to be integrable
if it has n indepedent global constraints of the motion (one of which is the Hamil-
tonian itself), no two of which have a non-zero Poisson bracket. The existence
of n invariants confinements the phase trajectories to an n-dimensional subspace
(recall that the entire phase space is 2n-dimensional; this is a significant reduction
of dimension). The independence of these invariants guarantees that none can be
expressed as a function of the others. The last condition, that no two of the in-
variants has a non-zero Poisson bracket, restricts the topology of the manifold to
which the trajetories are confined — it must be a n-dimensional torus. A canonical
transformation to what are known as action-angle variables, for which
Ii =12π
∮γi
p · dq
provides the canonical momenta and the angle θi around the loop γi provides the
canonical coordinates, simplifies the description immensely: Each Ii provides a
frequency for uniform motion around the loops defined by the γi, generating tra-
jectories which spiral uniformly around the surface of the n-torus. For most choices
of the Ii, a single trajectory will fill up the entire torus; this is called emphquasi-
periodic motion. The microcanonical ensemble, for which the trajectories wander
20 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
ergodically on an (2n− 1)-dimensional energy surface, captures none of this struc-
ture. On one hand, highly structured Hamiltonian systems appear to exist in
Nature, the premiere example being our solar system. On the other hand, we have
the remarkable success of the statistical mechanics (and its underlying hypotheses
of ergodicity and equal a priori probabilities) in providing a foundation for thermo-
dynamics and condensed matter physics. This success remains a mystery.
6. Statistical Fluctuations
In this section, we consider some of the statistical consequences of having a
large number N of particles in a physical system. In particular, we examine two
important theorems from probability theory, the law of large numbers and the
central limit theroem. We also derive the important result that, relative to the
size of the system, the size of charactistic fluctuations falls off like 1/√N as N
increases. This result has consequences for the thermodynamic uniformity observed
in macroscopic systems and will lead us, in the next section, to the concept of
entropy. The central limit theorem is a very deep result reponsible for the prevalence
of normal distributions in nature.
There are two prototypical systems used in physics to introduce the topic of
fluctuations — coin flipping and the one-dimensional random walk — and these
two systems are, in fact, mathematically identical. In a random walk, we imagine a
particle which takes discrete steps in randomly chosen directions, where every step
is totally uncorrelated with all previous steps. Though some texts use the colorful
analogy of a drunk man stumbling around, the physical motivation for interest
in the random walk grew out of the problem of Brownian motion, which we will
examine in a later chapter. For a one-dimensional random walk, there are only two
possibilities, a step to the “right” and a step to the “left”, and it is usually assumed
that all steps have equal size s. Let p be the probability of step in the +x direction
and q = 1−p be the probability of a step in −x direction. After a total of N steps,
how far will the particle have moved? If n represents the number of steps (out of
N total) taken in the +x direction, then the net distance traveled d is given by
d = ns− (N − n)s = (2n−N)s
6. STATISTICAL FLUCTUATIONS 21
and we can show that the probability of traveling this far is given by
(42) P(n) =N !
n! (N − n)!pn(1− p)N−n.
This result is known as the binomial distribution, since P(n) represents a typical
term in the binomial expansion
(43) (p+ q)N =N∑
n=0
N !n! (N − n)!
pnqN−n.
The derivation of (42) involves an understanding of the use of factorials in combi-
natorics and is left to the exercises.
Coin flipping, a more everyday example of a random system, leads to the same
probability distribution: If we let p represent the probability of obtaining the result
“heads” in a single coin flip, then the probability of obtaining n heads in N inde-
pendent coin flips is given by the binomial formula (42). We all have the intuitive
sense that a fair coin, that is, one with p = 0.5, lands heads half of the time, on
average. Many people believe in a “law of averages” operating here — fluctuations
in repeated trials tend to disappear after many trials; the fluctuations “average
out”. This popular intuition is responsible for many bad gambling decisions. It is
not true that after obtaining a long string of heads, additional flips are more likely
to land tails in order to even things out; a fair coin always has a 50-50 chance of
landing heads, regardless of the results of all previous flips. It is true however that,
of all the possible outcomes for a large number of coin flips, landing all heads is
unlikely and becomes more unlikely as the number of trials considered increases.
The precise statement of this result is known to mathematicians as the law of
large numbers:
In repeated, independent trials with the same probability p of
success in each trial, the probability that the percentage of suc-
cesses n/N differs from p by more than a fixed amount, ε > 0,
converges to zero as the number of trials N goes to infinity. This
holds for every positive ε.
It is important to note that significant deviations between the percentage af suc-
cesses and the chance of success p can occur. Furthermore, the law applies only
to the results of many trials considered together; it says nothing about short-term
22 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
fluctuations. The law of large numbers is quite useful in computations involving
a large number of random trials, where we may exploit the relation between the
percentage of success and the probability of success to great result.
The law of large numbers tells us to expect small fluctuations in a large system.
We’d like to know how small, however. Two of the most important quantities to
know about a probability distribution P (n) are the mean and variance of n. The
mean of n is the expectation value,
(44) 〈n〉 =∑
n
n P (n),
and the variance is given by
(45) V (n) = 〈n2〉 − 〈n〉2 =∑
n
n2 P (n)− 〈n〉2.
It is the variance that tells us about fluctuations. A standard element in introduc-
tory discussions of the random walk is a computation of the mean and variance
of the distance traveled d using the binomial distribution. It is not difficult to see
that, for the case p = 0.5,
〈d〉 = 0,(46)
〈d2〉 = N.(47)
The first result make sense: if the particle is equally likely to go left and right, then
on average it should get nowhere. The thr square root of the second result tells
how far away the particle typically wanders in N steps:√N . The random walk
is a remarkably general mathematical system and we can obtain the same results
using a very elegant approach, due to Feynman: Let the vector L represent a step
of length L taken in an arbitrary direction. After N − 1 steps the net displacement
can be represented by the vector RN−1. Thus,
RN = RN−1 + L
and
R 2N = RN ·RN = R 2
N−1 + 2RN−1 · L + L2.
Since each step is taken in a random direction, 〈RN−1 · L〉 = 0. It follows that
〈R 2N 〉 = 〈R 2
N−1〉+ 〈L2〉.
6. STATISTICAL FLUCTUATIONS 23
Therefore, by induction,
〈R 2N 〉 = N〈L2〉.
This argument works in an arbitrary number of dimensions and includes random
walks for which the step is not uniform. Given these results, we see that the root-
mean-square deviation — the square root of the variance — is proportional to√N . Note that as N increases, the random walk is capable of further and further
excursions from the starting point. However, the relative fluctuations — the
ratio of the root-mean-square deviation to N — tends to zero. (INTERPRET?) In
the exercises, we examine the consequences of this important result for fluctuations
in an ideal gas.
Another standard element in introductory discussions of the random walk is the
large N limit, where the binomial distribution turns into a Gaussian or normal
distribution,
(48) P (d) −→ 1√2πσ2
exp[− d2
2σ2
].
This result is in fact far deeper than the binomial distribution, or even the random
walk itself; it is an expression of one of the most important theorems in probability
theory, the central limit theorem:
Given a distribution with mean m and variance σ2, the sampling
distribution of the mean approaches a normal distribution with
mean m and variance σ2/N as N , the sample size, increases.
The meaning of this statement is as follows. Consider drawing a random sample
of size N from the given distribution and calculating the mean µ of this sample.
If we were able to repeat this experiment an infinite number of times, calculating
the mean of each sample drawn, we would obtain a distribution called the sampling
distribution of the mean. This distribution describes what the mean µ of a randomly
drawn sample of size N is likely to be. The central limit theorem, stated above,
claims that
(1) The mean of the sampling distribution of means equals the mean m of the
original distribution from which the samples are drawn.
24 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
(2) The variance of the sampling distribution of means equals the variance σ2
of the original distribution from which the samples are drawn, divided by
N .
(3) Regardless of the shape of the original distribution, the sampling distri-
bution of the mean approaches a normal distribution as the sample size N
increases (remember, N is the sample size and not the number of samples).
Thus, for the random walk considered earlier, the sample size is the number of
steps taken. (1) and (2) on the above list restate our results for the mean and
variance of a random walk. (3), however, is new and expresses the statement that
the binomial distribution looks more and more like a normal distribution as the
number of steps taken gets larger. For the one-dimensional random walk, we can
do this limit directly using Stirling’s approximation to convert factorial of large
numbers into exponentials; see the exercises below.
WAIT A MINUTE, THE VARIANCE OF A RANDOM WALK IS N NOT
1/N ...HAVE I SCREWED SOMETHING UP?
Exercise 1.10 (Combinatorics and the factorial function).
(1) Given a list of N distinct numbers, how many different permutations of
these numbers exist? (Answer: N ! = N(N − 1)(N − 2) · · · 1)
(2) Given a list of N numbers, N−m of which are distinct, how many different
permutations of these numbers exist? (Answer:N !m!
)
(3) Given a list of N distinct numbers, show that there are
N !(N −m)!
different ways of selecting m of these numbers?
(4) Given a list of N distinct numbers, show that there are
N !m! (N −m)!
different ways of selecting m of these numbers, when the order in which
they are selected does not matter? This combination of factorials is called
“N choose m”.
Exercise 1.11 (The binomial distribution and the random walk). Derive equa-
tion (42).
6. STATISTICAL FLUCTUATIONS 25
Solution. The problem breaks up into two parts: First, show that the prob-
ability of a particular sequence of steps — i.e. first two to the right, then one to
the left, and so on until N steps have been taken — is equal to pnqN−n, where n
is the total number of steps taken in +x direction and N − n is the total taken in
−x direction. The net distance traveled is d = 2n−N . Next, we have to consider
all the various sequences of steps which result in a final displacement of d. There
are N choose n of these sequences, adding a multiplicity factor which completes the
derivation.
Exercise 1.12 (Mean and variance of the binomial distribution). Derive equa-
tions (46) and (47).
Solution. First, note that
(49) (px+ q)N =N∑
n=0
(N
n
)(px)nqN−n = P(n)xn
and therefore
ddx
(px+ q)N =N∑
n=0
nP(n)xn−1.
The right-hand side of this expression, evaluated at x = 1, is the mean value of
n. Therefore, q = p − 1 implies that 〈n〉 = Np. A similar argument gives us
〈n2〉 = Np+N(N − 1)p2. When we express n in terms of d and substitute into the
definitions of mean and variance,
〈d〉 =∑
d
dP(d),
〈(d− 〈d〉)2〉 = 〈d2〉 − 〈d〉2,
the conclusion quickly follows. This slick derivation is due to Chandrasekhar (REF).
Exercise 1.13 (The Poisson distribution). Show that the Binomial distribu-
tion,
P(n) =(N
n
)pn(1− p)(N−n),
for the special case of small probabilities p 1 and infrequent events n N ,
reduces to
(50) P(n) =λn
n!exp−λ,
26 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
where λ = Np. (50) is known as the Poisson distribution. Show that this distribu-
tion is correctly normalized. Calculate the mean and variance of n for a Poisson
distribution.
—————————————————————————————————
TURN INTO AN EX:
or, in terms of N , p, and d alone,
(51) P(d) =(
N12 (N + d)
)p
12 (N+d)(1− p)
12 (N−d).
Let’s return to the binomial distribution for p = 0.5,
P(d) =(
N12 (N + d)
)1
2N.
The limit of this distribution for very large N and N d is of central importance.
Using Stirling’s approximation,
N ! w N logN −N,
we can write
log P(d) w N logN − 12(N +d) log
(12(N +d)
)− 1
2(N −d) log
(12(N −d)
)−N log 2.
Using d N , we can Taylor expand the logarithms,
log(12(N + d)
)= log
(N2
(1± d
N))
= logN
2± d
N− d2
2N.
With these approximations, (???) reduces to
log P(d) w−d2
2N.
Normalization →
P(d) w1
2πNexp
(−d2
2N),
a Gaussian distribution.
——————————————————————–
Exercise 1.14 (Fluctuations in an ideal gas). In equilibrium, we expect an ideal
gas to be evenly distributed throughout its volume. On the other hand, microscopic
fluctuations do occur. In this exercise, we demonstrate why large fluctuations from
uniformity in an ideal gas are extremely rare.
7. THE STATISTICAL BASIS FOR ENTROPY 27
(1) Consider a macroscopic ideal gas of N molecules confined to a volume V
and partition this volume ino two equal subvolumes, a ”left” and a ”right”.
What is the probability that there are m more molecules on the left than
on the right?
(2) Show that the probability distribution P(m) is Gaussian for m N . What
is the width of this Gaussian and what are the consequences for large
fluctuations?
Exercise 1.15 (Pressure in an ideal gas). In this exercise, we will rederive the
relationship between pressure and internal energy in an ideal gas. USE
P =〈F 〉A
= nm〈v 2x 〉,
(1) Let P(px) represent the probability that the x-component of a particular
molecule’s momentum equals px. Constant total energy E in the gas con-
fines the momentum vector p = pi to lie somewhere on a sphere of
radius R =√
2mE in momentum space...
(2) DERIVE GAUSSIAN FORM OF P(px). DON’T USE EQUIPARTI-
TION THM, INSTEAD LET β = 3N2E .
(3) Use this distribution to compute 〈v2〉, USE GAMMA FUNCTION...PLUGGING
IN FOR β restores result.
7. The Statistical Basis for Entropy
NEEDS WORK.
In order to connect the collective behavior of the system with the dynamics of its
constituent particles, statistical mechanics makes a distinction between microscopic
and macroscopic states of the system which makes precise the content of the atomic
hypothesis: A microscopic state or microstate of the system expresses the precise
state of every single particle in the system. Thus, classically, each point (q, p) in
phase space represents a distinct microstate of the system. If a quantum mechanical
viewpoint is taken, however, a complete specification of the quantum numbers for
each particle replaces the specification of the qi and pi. A macroscopic state or
macrostate of the system, on the other hand, expresses only the values of large-
scale properties such as total energy, total number of particles, and volume occupied
or local density. In a macrostate, all individual particle state information has been
28 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
averaged out; a macrostate encodes far less information. In this and the following
sections, we will discuss how the distinction between macroscopic and microscopic
states leads to a statistical basis for the thermodynamic concept of entropy.
The microscopic description is always constrained by the macroscopic descrip-
tion. Finite energy, for instance, restricts trajectories to remain in bounded regions
of phase phase. When total energy is conserved, trajectories are confined to a par-
ticular hypersurface. Microstates which are consistent with a given macrostate are
called accessible microstates. Let Ω represent the number of accessible microstates
associated with a particular macrostate. In the microcanonical ensemble, the ac-
cessible microstates are spread uniformly across an energy surface; Ω is therefore
in this case a function of energy alone Ω(E). Furthermore, the probability of the
system being in any one of the Ω(E) different accessible microstates is
p =1
Ω(E),
since these microstates are assumed to be occupied with equal probability. For a
classical system, Ω is simply the volume of the accessible region of phase space...
THIS IS GETTING TOO VERBOSE AND COMPLICATED. THE BASIC
IDEA IS THE SYSTEM IS MORE LIKELY TO BE IN MACROSTATES WITH
LARGER MULTIPLICITIES; ENTROPY IS JUST ANOTHER MEASURE OF
THIS (AN ADDITIVE ONE)...
...If, on the other hand, E is allowed to vary, than the assumption of equal
a priori probabilities implies that the system is more likely to be found in those
macrostates with greater multiplicity. This is the connection between thermody-
namic entropy S and multiplicity,
(52) S(E) = f(Ω(E)).
We can determine the form of this function f using only the assumption that
entropy is additive and the rules of combinatorics (ref Einstein): Consider two
non-interacting systems brought together. If the multiplicities for the two systems
are Ω1 and Ω2, respectively, then the multiplicity of the combined system is the
product,
(53) Ω = Ω1Ω2.
8. THERMODYNAMICS EQUILIBRIUM 29
By assumption, the entropy of the combined system is the sum of the entropies of
the subsystems (since they are non-interacting),
(54) S = S1 + S2.
Then, (52-54) together imply
f(Ω1Ω2) = f(Ω) = S = S1 + S2 = f(Ω1) + f(Ω2).
This condition requires that f be a logarithmic function of Ω,
(55) S = k log(Ω).
The constant k is determined by the choice of units. Historically, entropy was
measured in units of Joules per degree Kelvin. This sets the value of k at
kB = 1.381 · 10−23J/K.
kB is known as Boltzmann’s constant.
EXERCISES: A GOOD QUANTUM EXAMPLE, COUNTING MIXING, EN-
TROPY OF IDEAL GAS, ENTROPY OF HARD SPHERE GAS
8. Thermodynamics Equilibrium
MAY NEED TO DISCUSS EQUILIBRIUM HERE...SEE EARLIER VER-
SIONS
Consider again two subsystems brought into contact, except now allow energy
to flow between them. The total energy E of the combined system is equal to the
sum of the energies E1 and E2 of the two subsystems separately; note that E is
conserved if we assume that the total system is closed. Thus, the total entropy can
be written,
S(E) = S1(E1) + S2(E2) = S1(E1) + S2(E − E1).
The equilibrium condition that the entropy S be at a maximum requires
dSdE1
=dS1
dE1+
dS2
dE2
dE2
dE1
=dS1
dE1− dS2
dE2= 0.
30 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
In thermal equilbrium, no heat flows between the two subsystems and we say that
they are at the same temperature. The equality,
dS1
dE1=
dS2
dE2,
derived above, therefore suggests that we define temperature in terms of this de-
rivative. The following definition is the convention:
(56)1T
=dSdE
.
Thus, from the statistical mechanics viewpoint, temperature can be understood
as an energy cost for stealing entropy from a system. Note that, like entropy,
temperature is a purely statistical quantity which only has meaning for macroscopic
systems.
This viewpoint also sharpens our understanding of the approach to equilibrium.
Consider the same two subsystems in thermal contact, except assume that they are
not in equilibrium — i.e. their temperatures are not equal. Then the second law
of thermodynamics implies
ddtS(E) =
ddtS1(E1) +
ddtS2(E2)
=dS1
dE1
dE1
dt+
dS2
dE2
dE2
dt> 0.
However, the total energy is conserved: E2 = E −E1, where E is constant. There-
fore,
dSdt
=(
dS1
dE1− dS2
dE2
)dE1
dt
=(
1T1
− 1T2
)dE1
dt> 0.
If the temperature of the second subsystem is great than that of the first, T2 > T1,
then E1 must increase monotonically time in order to satisfy the above inequality.
Likewise, if T1 > T2, then E1 must decrease with time. Energy flows from regions
of higher temperature to regions of lower temperature, until thermal equilibrium is
reached.
In (or near) equilibrium, the macrostate of a system is determined by the total
energy E, the total number of particles N , and the volume V ; all other macroscopic
quantities can be written as functions of these variables. Thus given an function
of state, S(E, V,N), we can produce other function of state such as E(S, V,N)
9. DOES ENTROPY INCREASE? 31
by inverting the variables. We have seen that temperature can be defined as a
derivative of this function; applying the inverse function theorem, we may rewrite
(56) as
(57) T =(∂E
∂S
)N,V
The partial derivative in (57) reminds us that S depends on several thermodynamic
quantities; thus there are two other analogous expressions which might be derived
in the same way. Just asdE1
dS1=
dE2
dS2
holds for our two systems in equilbrium, so too must the expressions
dE1
dV1=
dE2
dV2(58)
dE1
dN1=
dE2
dN2(59)
hold as well. (58) is used to define pressure P and (59) is used to define chemical
potential µ:
(60) P = −(∂E
∂V
)S,N
; µ =(∂E
∂N
)S,V
.
MINUS SIGNS? Here, as with temperature, the second law of thermodynamics
implies that the volumes of the two systems change in a well-defined way, with one
system expanding and forcing the other to contract so as to equalize the pressures.
From this perspective, pressure can be understood as defining an energy cost for
decreasing the volume. The condition of equal pressures (58) is called mechanical
equilibrium; in general a non-equilibrium system settles in mechanical equilibrium
much faster than thermal equilibrium (REF). Similarly, particle flow between the
two systems (if possible) is defined by the difference in chemical potentials and µ
can be understood as defining an energy cost for stealing particles from a system.
The condition (59) is called chemical equilibrium. We will have much more to
say about the chemical potential later on.
THERMO EXERCISES
9. Does Entropy Increase?
PARADOX OF HAVING REVERSIBLE MICROSCOPIC LAWS AND IRRE-
VERSIBLE MACROSCOPIC LAWS...
32 1. ERGODICITY AND THE MICROCANONICAL ENSEMBLE
10. Information Theory
References
R. Bowley and M. Sanchez, Introductory Statistical Mechanics 2nd ed., Claren-
don Press, Oxford (2001).
S. Chandrasekhar, ”Stochastic Problems in Physics and Astronomy”, Reviews
of Modern Physics, Vol. 15, No. 1, pp. 1-89 (1943).
C. Kittel and H. Kroemer, Thermal Physics 2nd ed., W. H. Freeman and Co.,
New York (1995).
Landau and Lifschitz
Onsager
Pathria
Reif