mat 280: multivariable calculus - math.usm.edu · mat 280: multivariable calculus james v. lambers...

MAT 280: Multivariable Calculus

James V. Lambers

July 17, 2012

Contents

1 Partial Derivatives 5

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.1 Partial Differentiation . . . . . . . . . . . . . . . . . . 5

1.1.2 Multiple Integration . . . . . . . . . . . . . . . . . . . 7

1.1.3 Vector Calculus . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Functions of Several Variables . . . . . . . . . . . . . . . . . . 9

1.2.1 Terminology and Notation . . . . . . . . . . . . . . . . 10

1.2.2 Visualization Techniques . . . . . . . . . . . . . . . . . 12

1.3 Limits and Continuity . . . . . . . . . . . . . . . . . . . . . . 14


1.3.2 Defining Limits Using Neighborhoods . . . . . . . . . 16

1.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.3.4 Techniques for Establishing Limits and Continuity . . 20

1.4 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . 22


1.4.2 Clairaut’s Theorem . . . . . . . . . . . . . . . . . . . . 26

1.4.3 Techniques . . . . . . . . . . . . . . . . . . . . . . . . 27

1.5 Tangent Planes, Linear Approximations and Differentiability 31

1.5.1 Tangent Planes and Linear Approximations . . . . . . 31

1.5.2 Functions of More than Two Variables . . . . . . . . . 33

1.5.3 The Gradient Vector . . . . . . . . . . . . . . . . . . . 34

1.5.4 The Jacobian Matrix . . . . . . . . . . . . . . . . . . . 36

1.5.5 Differentiability . . . . . . . . . . . . . . . . . . . . . . 38

1.6 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . 40

1.6.1 The Implicit Function Theorem . . . . . . . . . . . . . 43

1.7 Directional Derivatives and the Gradient Vector . . . . . . . . 47

1.7.1 The Gradient Vector . . . . . . . . . . . . . . . . . . . 48

1.7.2 Directional Derivatives . . . . . . . . . . . . . . . . . . 49

1.7.3 Tangent Planes to Level Surfaces . . . . . . . . . . . . 51

3

4 CONTENTS

1.8 Maximum and Minimum Values . . . . . . . . . . . . . . . . 531.9 Constrained Optimization . . . . . . . . . . . . . . . . . . . . 611.10 Appendix: Linear Algebra Concepts . . . . . . . . . . . . . . 66

1.10.1 Matrix Multiplication . . . . . . . . . . . . . . . . . . 661.10.2 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . 671.10.3 The Transpose, Inner Product and Null Space . . . . 68

2 Multiple Integrals 712.1 Double Integrals over Rectangles . . . . . . . . . . . . . . . . 712.2 Double Integrals over More General Regions . . . . . . . . . . 75

2.2.1 Changing the Order of Integration . . . . . . . . . . . 782.2.2 The Mean Value Theorem for Integrals . . . . . . . . 80

2.3 Double Integrals in Polar Coordinates . . . . . . . . . . . . . 802.4 Triple Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 852.5 Applications of Double and Triple Integrals . . . . . . . . . . 902.6 Triple Integrals in Cylindrical Coordinates . . . . . . . . . . . 912.7 Triple Integrals in Spherical Coordinates . . . . . . . . . . . . 932.8 Change of Variables in Multiple Integrals . . . . . . . . . . . 96

3 Vector Calculus 1033.1 Vector Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 1033.2 Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . 1063.3 The Fundamental Theorem for Line Integrals . . . . . . . . . 1133.4 Green’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 1183.5 Curl and Divergence . . . . . . . . . . . . . . . . . . . . . . . 1233.6 Parametric Surfaces and Their Areas . . . . . . . . . . . . . . 1273.7 Surface Integrals . . . . . . . . . . . . . . . . . . . . . . . . . 132

3.7.1 Surface Integrals of Scalar-Valued Functions . . . . . . 1323.7.2 Surface Integrals of Vector Fields . . . . . . . . . . . . 134

3.8 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 1413.8.1 A Note About Orientation . . . . . . . . . . . . . . . . 144

3.9 The Divergence Theorem . . . . . . . . . . . . . . . . . . . . 1453.10 Differential Forms . . . . . . . . . . . . . . . . . . . . . . . . 148

Chapter 1

Partial Derivatives

1.1 Introduction

This course is the fourth course in the calculus sequence, following MAT167, MAT 168 and MAT 169. Its purpose is to prepare students for moreadvanced mathematics courses, particularly courses in mathematical pro-gramming (MAT 419), advanced engineering mathematics (MAT 430), realanalysis (MAT 441), complex analysis (MAT 436), and numerical analysis(MAT 460 and 461). The course will focus on three main areas, which webriefly discuss here.

1.1.1 Partial Differentiation

In single-variable calculus, you learned how to compute the derivative of afunction of one variable, y = f(x), with respect to its independent variable x,denoted by dy/dx. In this course, we consider functions of several variables.In most cases, the functions we use will depend on two or three variables,denoted by x, y and z, corresponding to spatial dimensions.

When a function f(x, y, z), for example, depends on several variables, itis not possible to describe its rate of change with respect to those variablesusing a single quantity such as the derivative. Instead, this rate of change is avector quantity, called the gradient, denoted by ∇f . Each component of thegradient is the partial derivative of f with respect to one of its independentvariables, x, y or z. That is,

∇f =[

∂f∂x

∂f∂y

∂f∂z

].

For example, the partial derivative of f with respect to x, denoted by∂f/∂x, describes the instantaneous rate of change of f with respect to x,

5

6 CHAPTER 1. PARTIAL DERIVATIVES

when y and z are kept constant. Partial derivatives can be computed usingthe same differentiation techniques as in single-variable calculus, but onemust be careful, when differentiating with respect to one variable, to treatall other variables as if they are constant. For example, if f(x, y) = x2y+y3,then

∂f

∂x= 2xy,

∂f

∂y= x2 + 3y2,

because the y3 term does not depend on x, and therefore its partial derivativewith respect to x is zero.

If

F(x, y, z) =

F1(x, y, z)F2(x, y, z)F3(x, y, z)

is a vector-valued function of three variables, then each of its componentfunctions F1, F2, and F3 has a gradient vector, and the rate of change ofF with respect to x, y and z is described by a matrix, called the Jacobianmatrix

JF(x, y, z) =

∂F1∂x

∂F1∂y

∂F1∂z

∂F2∂x

∂F2∂y

∂F2∂z

∂F3∂x

∂F3∂y

∂F3∂z

,where each entry of JF(x, y, z) is a partial derivative of one of the componentfunctions with respect to one of the independent variables.

We will learn how to generalize various concepts and techniques fromsingle-variable differential calculus to the multi-variable case. These include:

• tangent lines, which become tangent planes for functions of two vari-ables and tangent spaces for functions of three or more variables. Theseare used to compute linear approximations similar to those of functionsof a single variable.

• The Chain Rule, which generalizes from a product of derivatives to aproduct of Jacobian matrices, using standard matrix multiplication.This allows computing the rate of change of a function as its inde-pendent variables change along any direction in space, not just alongany of the coordinate axes, which in turn allows determination of thedirection in which a function increases or decreases most rapidly.

• computing maximum and minimum values of functions, which, in themulti-variable case, requires finding points at which the gradient isequal to the zero vector (corresponding to finding points at which the

1.1. INTRODUCTION 7

derivative is equal to zero) and checking whether the matrix of secondpartial derivatives is positive definite for a minimum, or negative defi-nite for a maximum (which generalizes the second derivative test fromthe single-variable case). We will also learn how to compute maxi-mum and minimum values subject to constraints on the independentvariables, using the method of Lagrange multipliers.

1.1.2 Multiple Integration

Next, we will learn how to compute integrals of functions of several variablesover multi-dimensional domains, generalizing the definite integral of a func-tion f(x) over an interval [a, b]. The integral of a function of two variablesf(x, y) represents the volume under a surface described by the graph of f ,just as the integral of f(x) is the area under the curve described by thegraph of f .

In some cases, it is more convenient to evaluate an integral by firstperforming a change of variables, as in the single-variable case. For example,when integrating a function of two variables, polar coordinates is useful. Forfunctions of three variables, cylindrical and spherical coordinates, which areboth generalizations of polar coordinates, are worth considering.

In the general case, evaluating the integral of a function of n variablesby first changing to n different variables requires multiplying the integrandby the determinant of the Jacobian matrix of the function that maps thenew variables to the old. This is a generalization of the u-substitution fromsingle-variable calculus, and also relates to formulas for area and volumefrom MAT 169 that are defined in terms of determinants, or equivalently, interms of the dot product and cross product.

1.1.3 Vector Calculus

In the last part of the course, we will study vector fields, which are functionsthat assign a vector to each point in its domain, like the vector-valued func-tion F described above. We will first learn how to compute line integrals,which are integrals of functions along curves. A line integral can be viewedas a generalization of the integral of a function on an interval, in that dxis replaced by ds, an infinitesimal distance between points on the curve. Itcan also be viewed as a generalization of an integral that computes the arclength of a curve, as the line integral of a function that is equal to one yieldsthe arc length. A line integrals of a vector field is useful for computing thework done by a force applied to an object to move it along a curved path. To


facilitate the computation of line integrals, a variation of the FundamentalTheorem of Calculus is introduced.

Next, we generalize the notion of a parametric curve to a parametricsurface, in which the coordinates of points on the surface depend on twoparameters u and v, instead of a single parameter t for a parametric curve.Using the relation between the cross product and the area of a parallelogram,we can define the integral of a function over a parametric surface, which issimilar to how a change of variables in a double integral is handled. Then,we will learn how to integrate vector fields over parametric surfaces, whichis useful for computing the mass of fluid that crosses a surface, given therate of flow per unit area.

We conclude with discussion of several fundamental theorems of vectorcalculus: Green’s Theorem, Stokes’ Theorem, and the Divergence Theorem.All of these can be seen to be generalizations of the Fundamental Theorem ofCalculus to higher dimensions, in that they relate the integral of a functionover the interior of a domain to an integral of a related function over itsboundary. These theorems can be conveniently stated using the div andcurl operations on vector fields. Specifically, if F = 〈P,Q,R〉, then

divF = ∇ · F =∂P

∂x+∂Q

∂y+∂R

∂z,

curlF = ∇× F =

⟨∂R

∂y− ∂Q

∂z,∂P

∂z− ∂R

∂x,∂Q

∂x− ∂P

∂y

⟩.

However, using the language of differential forms, we can condense the Fun-damental Theorem of Calculus and all four of its variations into one theorem,known as the General Stokes’ Theorem. We now state all six results; theirdiscussion is deferred to Chapter 3.

Fundamental Theorem of Calculus:∫ b

af ′(x) dx = f(b)− f(a)

where f is continuously differentiable on [a, b]

Fundamental Theorem of Line Integrals:∫ b

a∇f(r(t)) · r′(t) dt = f(r(b))− f(r(a))

1.2. FUNCTIONS OF SEVERAL VARIABLES 9

where r(t) = 〈x(t), y(t), z(t)〉, a ≤ t ≤ b, is the position function for a curveC and f(x, y, z) is a continuously differentiable function defined on C

Green’s Theorem:∫D

(∂Q

∂x− ∂P

∂y

)dA =

∫∂D

P dx+Qdy

where D is a 2-D region with piecewise smooth boundary ∂D and P and Qare continuously differentiable on D

Stokes’ Theorem: ∫S

curlF · n dS =

∫∂S

F ·T ds

where S is a surface in 3-D with unit normal vector n, and piecewise smoothboundary ∂S with unit tangent vector T, and F is a continuously differen-tiable vector field

Divergence Theorem: ∫E

divF dV =

∫∂E

F · n dS

where E is a solid region in 3-D with boundary surface ∂E, which hasoutward unit normal vector n, and F is a continuously differentiable vectorfield

General Stokes’ Theorem: ∫Mdω =

∫∂M

ω

where M is a k-manifold and ω is a (k − 1)-form on M

1.2 Functions of Several Variables

Multi-variable calculus is concerned with the study of quantities that dependon more than one variable, such as temperature that varies within a three-dimensional object, which is a scalar quantity, or the velocity of a flowing


liquid, which is a vector quantity. To aid in this study, we first introducesome important terminology and notation that is useful for working withfunctions of more than one variable, and then introduce some techniques forvisualizing such functions.

1.2.1 Terminology and Notation

The following standard notation and terminology is used to define, and dis-cuss, functions of several variables and their visual representations. As theywill be used throughout the course, it is important to become acquaintedwith them immediately.

• The set R is the set of all real numbers.

• If S is a set and x is an element of S, we write x ∈ S.

Example 2 ∈ R, and π ∈ R, but i /∈ R, where i =√−1 is an imaginary

number.

• If S is a set and T is a subset of S (that is, every element of T is inS), we write T ⊆ S.

Example The set of natural numbers, N, and the set of integers, Z,satisfy N ⊆ Z. Furthermore, Z ⊆ R.

• The set Rn is the set of all ordered n-tuples x = (x1, x2, . . . , xn) of realnumbers. Each real number xi is called a coordinate of the point x.

Example Each point (x, y) ∈ R2 has coordinates x and y. Each point(x, y, z) ∈ R3 has an x-, y- and z-coordinate.

• A function f with domain D ⊆ Rn and range R ⊆ Rm is a set ofordered pairs of the form {(x,y)}, where x ∈ D and y ∈ R, such thateach element x ∈ D is mapped to only one element of R. That is, thereis only one ordered pair in f such that x is its first element. We writef : D → R to indicate that f maps elements of D to elements of R.We also say that f maps D into R.

Example Let R+ denote the set of non-negative real numbers. Thefunction f(x, y) = x2 + y2 maps R2 into R+, and we can write f :R2 → R+.

• Let f : D → R, and let D ⊆ Rn and R ⊆ Rm. If m = 1, we say thatf is a scalar-valued function, and if m > 1, we say that f is a vector-valued function. If n = 1, we say that f is a function of one variable,


and if n > 1, we say that f is a function of several variables. For eachx = (x1, x2, . . . , xn) ∈ D, the coordinates x1, x2, . . . , xn of x are calledthe independent variables of f , and for each y = (y1, y2, . . . , ym) ∈ R,the coordinates y1, y2, . . . , ym of y are called the dependent variablesof f .

Example The function z = x2 + y2 is a scalar-valued function ofseveral variables. The independent variables are x and y, and thedependent variable is z. The function r(t) = 〈x(t), y(t), z(t)〉, wherex(t) = t cos t, y(t) = t sin t, and z(t) = et, is a vector-valued functionwith independent variable t and dependent variables x, y and z.

• Let f : D ⊆ Rn → R. The graph of f is the subset of Rn+1 consisting ofthe points (x1, x2, . . . , xn, f(x1, x2, . . . , xn)), where (x1, x2, . . . , xn) ∈D.

Solution The graph of the function z = x2 + y2 is a parabola ofrevolution obtained by revolving the parabola z = x2 around the z-axis. The graph of the function z = x + y − 1 is a line in 3-D spacethat passes through the points (0, 0,−1) and (1, 1, 1).

• A function f : Rn → R is a linear function if f has the form

f(x1, x2, . . . , xn) = a1x1 + a2x2 + · · ·+ anxn + b,

where a1, a2, . . . , an and b are constants.

Example The function y = mx + b is a linear function of the singleindependent variable x. Its graph is a line contained within the xy-plane, with slope m, passing through the point (0, b). The functionz = ax+ by+ c is a linear function of the two independent variables xand y. Its graph is a line in 3-D space that passes through the points(0, 0, c) and (1, 1, a+ b+ c).

• Let f : D ⊆ Rn → R. We say that a set L is a level set of f if L ⊆ Dand f is equal to a constant value k on L; that is, f(x) = k if x ∈ L.If n = 2, we say that L is a level curve or level contour; if n = 3, wesay that L is a level surface.

Example A level surface of the function f(x, y, z) = x2+y2+z2, wheref(x, y, z) = k for a constant k, is a sphere of radius

√k. The level

curves of the function z = x2 + y2 are circles of radius√k with center

(0, 0, k), situated in the plane z = k, for each nonnegative number k.


1.2.2 Visualization Techniques

While it is always possible to obtain the graph of a function f(x, y), forexample, by substituting various values for its independent variables andplotting the corresponding points from the graph, this approach is not nec-essarily helpful for understanding the graph as a whole. Knowing the extentof the possible values of a function’s independent and dependent variables(the domain and range, respectively), along with the behavior of a few selectcurves that are contained within the function’s graph, can be more helpful.To that end, we mention the following useful techniques for acquiring thisinformation.

Figure 1.1: Level curves of the function z = x2 + y4

• To find the domain and range of a function f , it is often necessary toaccount for the domains of functions that are included in the definitionof f . For example, if there is a square root, it is necessary to avoidtaking the square root of a negative number.

Example Let f(x, y) = ln(x2 − y2). Since ln |x| is only defined forx > 0, we must have x2 > y2, which, upon taking the square rootof both sides, yields |x| > |y|. Therefore, this inequality defines thedomain of f . The range of f is the range of ln, which is R.

• To find the level set of a function f(x1, x2, . . . , xn), solve the equationf(x1, x2, . . . , xn) = k, where k is a constant. This equation will im-plicitly define the level set. In some cases, it can be solved for one of


Figure 1.2: Sections of the function z = x2 + y4

the independent variables to obtain an explicit function that describesthe level set.

Example Let z = 2x + y be a function of two variables. The graphof this function is a plane. Each level set of this function is describedby an equation of the form 2x + y = k, where k is a constant. Sincez = k as well, the graph of this level set is the line with equationy = −2x+ k, contained within the plane z = k.

Example Let z = ln y−x. Each level set of this function is describedby an equation of the form ln y− x = k, where k is a constant. Expo-nentiating both sides, we obtain y = ex+k. It follows that the graph ofthis level set is that of the exponential function, contained within theplane z = k, and shifted k units to the left (that is, along the x-axis).

• To help visualize a function of two variables z = f(x, y), it can behelpful to use the method of sections. This involves viewing the func-


tions when restricted to “vertical” planes, such as the xz-plane andthe yz-plane. To take these two sections, first set y = 0 to obtain z asa function of x, and then graph that function in the xz-plane. Then,set x = 0 to obtain z as a function of y, and graph that function inthe yz-plane. Using these graphs as guides, in conjunction with levelcurves, it is then easier to visualize what the rest of the graph of flooks like.

Example Let z = x2 + y4. Setting y = 0 yields z = x2, the graph ofwhich is a parabola in the xz-plane. Setting x = 0 yields z = y4, whichhas a graph that is a parabola-like curve, where z increases much morerapidly. Combining these graphs with selected level curves, which aredescribed by the equations y = 4

√k − x2, where |x| ≤

√k for k ≥ 0,

allows us to visualize the graph of this function. Level curves andsections are shown in Figures 1.1 and 1.2, respectively.

1.3 Limits and Continuity

Recall that in single-variable calculus, the fundamental concept of a limitwas used to define derivatives and integrals of functions, as well as the notionof continuity of a function. We now generalize limits and continuity to thecase of functions of several variables.


• Let f : D ⊆ R → R, and a ∈ D. We say f(x) approaches L as xapproaches a, and write

limx→a

f(x) = L

if, for any ε > 0, there exists a δ > 0 such that if 0 < |x− a| < δ, then|f(x)− L| < ε.

• If x = (x1, x2, . . . , xn) is a point in Rn, or, equivalently, if x =〈x1, x2, . . . , xn〉 is a position vector in Rn, then the magnitude, orlength, of x, denoted by ‖x‖, is defined by

‖x‖ =√x2

1 + x22 + · · ·+ x2

n =

(n∑i=1

x2i

)1/2

.

Note that if n = 1, then x is actually a scalar x, and ‖x‖ = |x|.

1.3. LIMITS AND CONTINUITY 15

Example If x = (3,−1, 4) ∈ R3, then ‖x‖ =√

32 + (−1)2 + 42 =√26. 2

• Let f : D ⊆ Rn → Rm, and a ∈ D. We say f(x) approaches b as xapproaches a, and write

limx→a

f(x) = b,

if, for any ε > 0, no matter how small, there exists a δ > 0 such thatfor any x such that 0 < ‖x− a‖ < δ, ‖f(x)− b‖ < ε. This definitionis illustrated in Figure 1.3. Note that the condition ‖x − a‖ > 0specifically excludes consideration of x = a, because limits are used tounderstand the behavior of a function near a point, not at a point.

Figure 1.3: Illustration of the limit, as x approaches a (left plot), of f(x)being equal to b (right plot). For any ball around the point b of radius ε(right plot), no matter how small, there exists a ball around the point a, ofradius δ (left plot), such that every point in the ball around a is mapped byf to a point in the ball around b.

Example Consider the function

f(x, y) =xy√x2 + y2

.

We will use the definition of a limit to show that as (x, y) → (0, 0),f(x, y)→ 0. Let ε > 0. We need to show that there exists some δ > 0


such that if 0 < ‖(x, y)− (0, 0)‖ =√x2 + y2 < δ, then |f(x, y)− 0| =

| xy√x2+y2

| < ε. First, we note that∣∣∣∣∣ y√x2 + y2

∣∣∣∣∣ <∣∣∣∣∣ y√

y2

∣∣∣∣∣ =

∣∣∣∣ y|y|∣∣∣∣ = 1.

Therefore, if we set δ = ε, we obtain∣∣∣∣∣ xy√x2 + y2

∣∣∣∣∣ = |x|

∣∣∣∣∣ y√x2 + y2

∣∣∣∣∣ < |x| = √x2 <√x2 + y2 < δ = ε,

from which it follows that the limit exists and is equal to zero. 2

• Let f : D ⊆ Rn → Rm, and let a ∈ D. We say that f is continuous ata if limx→a f(x) = f(a).

• Let f : D ⊆ Rn → R. We say that f is a polynomial if, for eachx = (x1, x2, . . . , xn) ∈ D, f(x) is equal to a sum of terms of the formxp11 x

p22 · · ·x

pnn , where p1, p2, . . . , pn are nonnegative integers.

Example The functions f(x) = x3+3x2+2x+1, g(x, y) = x2y3+3xy+x2 + 1, and h(x, y, z) = 4xy2z3 + 8yz2 are all examples of polynomials.2

• Let f, p, q : D ⊆ Rn → R, and let q(x) 6= 0 on D. We say thatf is a rational function if p and q are both polynomials and f(x) =p(x)/q(x).

Example The functions f(x) = 1/(x+1), g(x, y) = xy2/(x2+y3), andh(x, y, z) = (xy2 + z3)/(x2z + xyz2 + yz3) are all examples of rationalfunctions. 2

• An algebraic function is a function that satisfies a polynomial equationwhose coefficients are themselves polynomials.

Example The square root function y =√x is an algebraic function,

because it satisfies the equation y2 − x = 0. The function y = x5/2 +x3/2 is also an algebraic function, because it satisfies the equationx5 + 2x4 + x3 − y2 = 0. 2

1.3.2 Defining Limits Using Neighborhoods

An alternative approach to defining limits involves the concept of a neigh-borhood, which generalizes open intervals on the real number line.


• Let x0 ∈ Rn and let r > 0. We define the ball centered at x0 of radiusr, denoted by Dr(x0), to be the set of all points x ∈ Rn such that‖x− x0‖ < r.

Example In 1-D, the open interval (0, 1) is also the ball centered atx0 = 1/2 of radius r = 1/2. In 3-D, the inside of the sphere withcenter (0, 0, 0) and radius 2, {(x, y, z)|x2 + y2 + z2 < 4}, is also theball D2((0, 0, 0)). 2

• We say that a set U ⊆ Rn is open if, for any point x0 ∈ U , there existsan r > 0 such that Dr(x0) ⊆ U .

Example In 1-D, any open set is an open interval, such as (−1, 1),or a union of open intervals. In 2-D, the interior of the ellipse definedby the equation 4x2 + 9y2 = 1 is an open set; the ellipse itself is notincluded. 2

• Let x0 ∈ Rn. We say that N is a neighborhood of x0 if N is an openset that contains x0.

• Let A ⊆ Rn be an open set. We say that x0 ∈ Rn is a boundary pointof A if every neighborhood of x0 contains at least one point in A andone point not in A.

Example Let D = {(x, y)|x2 + y2 < 1}, which is often called the unitball in R2. This set consists of all points inside the unit circle withcenter (0, 0) and radius 1, not including the circle itself. The point(x0, y0) = (

√2/2,√

2/2), which is on the circle, is a boundary point ofD because, as illustrated in Figure 1.4, any neighborhood of (x0, y0)must contain points inside the circle, and points that are outside. 2

• Let f : D ⊆ Rn → Rm, and let a ∈ D or let a be a boundary point ofD. We say that a ∈ D. We say that f(x) approaches b as x approachesa, and write

limx→a

f(x) = b,

if, for any neighborhood N of b, there exists a neighborhood U of asuch that if x ∈ U , then f(x) ∈ N .

1.3.3 Results

In the statement of the following results concerning limits and continuity,f ,g : D ⊆ Rn → Rm, a ∈ D or a is a boundary point of D, b,b1,b2 ∈ Rm,and c ∈ R.


Figure 1.4: Boundary point (x0, y0) of the set D = {(x, y)|x2 +y2 < 1}. Theneighborhood of (x0, y0) shown, Dr((x0, y0)) = {(x, y)|(x−x0)2 +(y−y0)2 <0.1}, contains points that are in D and points that are not in D.,

• The limit of a function f(x) as x approaches a, if it exists, is unique.That is, if

limx→a

f(x) = b1 and limx→a

f(x) = b2,

then b1 = b2. It follows that if f(x) approaches two distinct values asx approaches a along two distinct paths, then the limit as x approachesa does not exist.

• If limx→a f(x) = b, then limx→a cf(x) = cb.

• If limx→a f(x) = b1 and limx→a g(x) = b2, then

limx→a

(f + g)(x) = b1 + b2.


Furthermore, if m = 1, then

limx→a

(fg)(x) = b1b2.

• If m = 1 and limx→a f(x) = b 6= 0, and f(x) 6= 0 in a neighborhoodof a, then

limx→a

1

f(x)=

1

b.

• If f(x) = (f1(x), f2(x), . . . , fm(x)), where f1, f2, . . . , fm are the com-ponent functions of f , and b = (b1,b2, . . . ,bm), then limx→a f(x) = bif and only if limx→a fi(x) = bi for i = 1, 2, . . . ,m.

• If f and g are continuous at a, then so is cf and f + g. If, in addition,m = 1, then fg is continuous at a. Furthermore, if m = 1 and if f isnonzero in a neighborhood of a, then 1/f is continuous at a.

• If f(x) = (f1(x), f2(x), . . . , fm(x)), where f1, f2, . . . , fm are the com-ponent functions of f , then f is continuous at a if and only if fi iscontinuous at a, for i = 1, 2, . . . ,m.

• Any polynomial function f : Rn → R is continuous on all of Rn.

• Any rational function f : D ⊆ Rn → R is continuous wherever it isdefined.

Example The function f(x, y) = 2x/(x2 − y2) is defined on all of R2

except where x2 − y2 = 0; that is, where |x| = |y|. Therfore, f iscontinuous at all such points. 2

• Let f : D ⊆ Rn → Rm, and let g : U ⊆ Rp → D. If the composition(f ◦ g)(x) = f(g(x)) defined on U , then f ◦ g is continuous at a ∈ U ifg is continuous at a and f is continuous at g(a).

Example The function g(x, y) = x2 + y2, being a polynomial, is con-tinuous on all of R2. The function f(z) = sin z is continuous on all ofR. Therefore, the composition (f ◦ g)(x, y) = f(g(x, y)) = sin(x2 + y2)is continuous on all of R2. 2

• Algebraic functions, such as xr where r is any rational number (forexample, f(x) =

√x) and trigonometric functions, such as sinx or

tanx, are continuous wherever they are defined.


1.3.4 Techniques for Establishing Limits and Continuity

We now discuss some techniques for computing limits of functions of severalvariables, or determining that they do not exist. We also demonstrate howto determine if a function is continuous at a point.

To show that the limit of a function f : D ⊆ Rn → R as x→ a does notexist, try letting x approach a along different paths to see if different valuesare approached. If they are, then the limit does not exist.

For example, let n = 2 and let x = (x, y) and a = (a1, a2). Then, trysetting x = a1 in the formula for f(x, y) and letting y approach a2, or viceversa. Other possible paths include, for example, setting x = cy, wherec = a1/a2, if a2 6= 0, and letting y approach a2, or considering the cases ofx < a1 and x > a1, or y < a2 and y > a2, separately.

Example Let f(x, y) = x3y/(x4+y4). If we let (x, y)→ (0, 0) by first settingy = 0 and then letting x→ 0, we observe that f(x, 0) = x3(0)/(x4 + 0) = 0for all x 6= 0. This suggests that f(x, y)→ 0 as (x, y)→ (0, 0). However, ifwe set x = y and let x, y → 0 together, we note that f(x, x) = x3x/(x4 +x4) = x4/(2x4) = 1/2, which suggests that the limit is equal to 1/2. Weconclude that the limit does not exist. 2

To show that the limit of a function f : D ⊆ Rn → R as x → a doesexist and is equal to b, use the definition of a limit by first assuming ε > 0,and then trying to find a δ so that |f(x)− b| < ε whenever 0 < ‖x−a‖ < δ.

To that end, try to find an upper bound on |f(x)− b| in terms of ‖x−a‖.Specifically, if it can be shown that |f(x) − b| < g(‖x − a‖), where g is aninvertible, increasing function, then a suitable choice for δ is δ = g−1(ε).Then, if ‖x− a‖ < δ = g−1(ε), then

|f(x)− b| < g(‖x− a‖) < g(g−1(ε)) = ε.

Example Let f(x, y) = (x3 − y3)/(x2 + y2). Letting (x, y) → (0, 0) alongvarious paths, it appears that f(x, y) → 0 as (x, y) → (0, 0). To confirmthis, we assume ε > 0 and try to find δ > 0 such that if 0 <

√x2 + y2 < δ,

then |(x3 − y3)/(x2 + y2)| < ε.

Factoring the numerator of f(x, y), we obtain

∣∣∣∣x3 − y3

x2 + y2

∣∣∣∣ =

∣∣∣∣(x− y)(x2 + xy + y2)

x2 + y2

∣∣∣∣ =

∣∣∣∣(x− y)

(1 +

xy

x2 + y2

)∣∣∣∣ .


Using |x| =√x2 ≤

√x2 + y2, and similarly |y| ≤

√x2 + y2, yields∣∣∣∣x3 − y3

x2 + y2

∣∣∣∣ = |x− y|

∣∣∣∣∣1 +

(x√

x2 + y2

)(y√

x2 + y2

)∣∣∣∣∣≤ 2|x− y|≤ 2(|x|+ |y|)≤ 4

√x2 + y2.

Therefore, if we let δ = ε/4, it follows that when√x2 + y2 < δ, then

|f(x, y)| < 4δ = 4(ε/4) = ε, and therefore the limit exists and is equal tozero. 2

Sometimes, it is helpful to simplify the formula for a function beforeattempting to determine its limit.


f(x, y) =(x+ y)2 − (x− y)2

xy, x, y 6= 0.

Expanding the numerator yields, for x, y 6= 0,

f(x, y) =(x2 + 2xy + y2)− (x2 − 2xy + y2)

xy=

4xy

xy= 4.

Therefore, even though f(x, y) is not defined at (0, 0), its limit as (x, y) →(0, 0) exists, and is equal to 4. This example demonstrates that a limitdepends only on the behavior of a function near a particular point; whathappens at that point is irrelevant. 2

In many cases, determining whether a function f : D ⊆ Rn → R iscontinuous can be accomplished by applying the various properties of con-tinuous functions stated above, and using the fact that various types offunctions, such as polynomial and rational functions, are known to be con-tinuous wherever they are defined.

Example Let c = 〈2,−1, 3〉, and let f : R3 → R3 be defined by f(x) = c×x,the cross product of the vector c and the vector x = 〈x1, x2, x3〉. Thisfunction is continuous on all of R3, because

f(x) = c× x = 〈−x3 − 3x2, 3x1 − 2x3, 2x2 + x1〉,

and each component function of f can be seen to be not only a polynomial,but a linear function. 2


However, in cases where a function is defined in a piecewise manner,continuity at boundaries between pieces must be determined by applyingthe definition of continuity directly, which requires computing limits.


f(x, y) =

{xy2

2x2+y2(x, y) 6= (0, 0)

0 (x, y) = (0, 0).

For (x, y) 6= (0, 0), f is continuous at (x, y) because it is a rational functionthat is defined. As (x, y)→ (0, 0), f(x, y)→ 0, as can be shown by applyingthe definition of a limit with δ = ε. Because this limit is equal to f(0, 0) = 0,we conclude that f is continuous at (0, 0) as well. 2

1.4 Partial Derivatives

Now that we have become acquainted with functions of several variables,and what it means for such functions to have limits and be continuous, weare ready to analyze their behavior by computing their instantaneous ratesof change, as we know how to do for functions of a single variable. However,in contrast to the single-variable case, the instantaneous rate of change ofa function of several variables cannot be described by a single number thatrepresents the slope of a tangent line. Instead, such a slope can only describehow one of the function’s dependent variables (outputs) varies as one of itsindependent variables (inputs) changes. This leads to the concept of whatis known as a partial derivative.


Let f : D ⊆ R → R be a scalar-valued function of a single variable. Recallthat the derivative of f(x) with respect to x at x0 is defined to be

df

dx(x0) = f ′(x0) = lim

h→0

f(x0 + h)− f(x0)

h.

Now, let f : D ⊆ R2 → R be a scalar-valued function of two variables,and let (x0, y0) ∈ D. The partial derivative of f(x, y) with respect to x at(x0, y0) is defined to be

∂f

∂x(x0, y0) = fx(x0, y0) = lim

h→0

f(x0 + h, y0)− f(x0, y0)

h.

1.4. PARTIAL DERIVATIVES 23

Note that only values of f(x, y) for which y = y0 influence the value ofthe partial derivative with respect to x. Similarly, the partial derivative off(x, y) with respect to y at (x0, y0) is defined to be

∂f

∂y(x0, y0) = fy(x0, y0) = lim

h→0

f(x0, y0 + h)− f(x0, y0)

h.

Note the two methods of denoting partial derivatives used above: ∂f/∂x orfx for the partial derivative with respect to x. There are other notations,but these are the ones that we will use.

Example Let f(x, y) = x2y, and let (x0, y0) = (2,−1). Then

fx(2,−1) = limh→0

(2 + h)2(−1)− 22(−1)

h

= limh→0

−(4 + 4h+ h2) + 4

h

= limh→0

−4h− h2

h= −4,

fy(2,−1) = limh→0

22(−1 + h)− 22(−1)

h

= limh→0

4(h− 1) + 4

h

= limh→0

4h

h= 4.

2

In the preceding example, the value fx(2,−1) = −4 can be interpretedas the slope of the line that is tangent to the graph of f(x,−1) = −x2

at x = 2. That is, we consider the restriction of f to the portion of itsdomain where y = −1, and thus obtain a function of the single variable x,g(x) = f(x,−1) = −x2. Note that if we apply differentiation rules fromsingle-variable calculus to g, we obtain g′(x) = −2x, and g′(2) = −4, whichis the value we obtained for fx(2,−1).

Similarly, if we consider fy(2,−1) = 4, this can be interpreted as theslope of a line that is tangent to the graph of p(y) = f(2, y) = 4y at y = −1.Note that if we differentiate p, we obtain p′(y) = 4, which, again, shows thatthe partial derivative of a function of several variables can be obtained by“freezing” the values of all variables except the one with respect to which we


are differentiating, and then applying differentiation rules to the resultingfunction of one variable.

It follows from this relationship between partial derivatives of a functionof several variables and the derivative of a function of a single variablethat other interpretations of the derivative are also applicable to partialderivatives. In particular, if fx(x0, y0) > 0, which is equivalent to g′(x0) > 0where g(x) = f(x, y0), we can conclude that f is increasing as x varies fromx0, along the line y = y0. Similarly, if fy(x0, y0) < 0, which is equivalent top′(y0) < 0 where p(y) = f(x0, y), we can conclude that f is decreasing as yvaries from y0 along the line x = x0.

We now define partial derivatives for a function of n variables. Let thevectors e1, e2, . . . , en ∈ Rn be defined as follows: for each i = 1, 2, . . . , n, eihas components that are all equal to zero, except the ith component, whichis equal to 1. Then these vectors are called the standard basis vectors of Rn.

Example If n = 3, then

e1 =

100

, e2 =

010

, e3 =

001

.We also have that e1 = i, e2 = j and e3 = k. 2

Let f : D ⊆ Rn → R be a scalar-valued function of n variables x1,x2,. . .,xn.Then, the partial derivative of f with respect to xi at x0 ∈ Rn, where1 ≤ i ≤ n, is defined to be

∂f

∂xi(x0) = fxi(x0)

= limh→0

f(x0 + hei)− f(x0)

h

= limh→0

f(x1, . . . , xi + h, . . . , xn)− f(x1, . . . , xn)

h.

Example Let f : R4 → R be defined by f(x) = (c ·x)2, where c ∈ R4 is thevector c = 〈4,−3, 2,−1〉. Let x0 ∈ R4 be the point x0 = 〈1, 3, 2, 4〉. Then,the partial derivative of f with respect to x2 at x0 is given by

fx2(x0) = limh→0

f(x0 + he2)− f(x0)

h

= limh→0

(c · (x0 + he2))2 − (c · x0)2

h


= limh→0

(c · x0 + hc · e2)2 − (c · x0)2

h

= limh→0

(c · x0)2 + 2(c · x0)(hc · e2) + (hc · e2)2 − (c · x0)2

h

= limh→0

2h(c · x0)(c · e2) + h2(c · e2)2

h

= limh→0

2(c · x0)(c · e2) + h(c · e2)2

= 2(c · x0)(c · e2)

= 2(〈4,−3, 2,−1〉 · 〈1, 3, 2, 4〉)(〈4,−3, 2,−1〉 · 〈0, 1, 0, 0〉)= 2[4(1)− 3(3) + 2(2)− 1(4)](−3)

= 2(−5)(−3)

= 30.

This shows that f is increasing sharply as a function of x2 at the point x0.Note that the same result can be obtained by defining

g(x2) = f(1, x2, 2, 4)

= (c · 〈1, x2, 2, 4〉)2

= (〈4,−3, 2,−1〉 · 〈1, x2, 2, 4〉)2

= (4− 3x2)2,

differentiating this function of x2 to obtain g′(x2) = 2(4−3x2)(−3), and thenevaluating this derivative at x2 = 3 to obtain g′(3) = 2(4− 3(3))(−3) = 30.2

Just as functions of a single variable can have second derivatives, thirdderivatives, or derivatives of any order, functions of several variables canhave higher-order partial derivatives. To that end, let f : D ⊆ Rn → R be ascalar-valued function of n variables x1, x2, . . . , xn. Then, the second partialderivative of f with respect to xi and xj at x0 ∈ D is defined to be

∂2f

∂xi∂xj(x0) = fxixj (x0)

=∂

∂xi

(∂f

∂xj

)(x0)

= limhi→0

fxj (x0 + hiei)− fxj (x0)

hi

= lim(hi,hj)→(0,0)

1

hihj[f(x0 + hiei + hjej)− f(x0 + hiei)−


f(x0 + hjej) + f(x0)].

The second line of the above definition is the most helpful, in terms ofdescribing how to compute a second partial derivative with respect to xiand xj : first, compute the partial derivative with respect to xj . Then,compute the partial derivative of the result with respect to xi, and finally,evaluate at the point x0. That is, the second partial derivative, or a partialderivative of higher order, can be viewed as an iterated partial derivative.

A commonly used method of indicating that a function is evaluated ata given point, especially if the formula for the function is complicated orotherwise does not lend itself naturally to the usual notation for evaluationat a point, is to follow the function with a vertical bar, and indicate theevaluation point as a subscript to the bar. For example, given a functionf(x), we can write

f ′(4) =df

dx

∣∣∣∣x=4

=df

dx

∣∣∣∣4

or, given a function f(x, y), we can write

fx(2, 3) =∂f

∂x

∣∣∣∣x=2,y=3

=∂f

∂x

∣∣∣∣(2,3)

.

This notation is similar to the use of the vertical bar in the evaluation ofdefinite integrals, to indicate that an antiderivative is to be evaluated at thelimits of integration.

1.4.2 Clairaut’s Theorem

The following theorem is very useful for reducing the amount of work nec-essary to compute all of the higher-order partial derivatives of a function.

Theorem (Clairaut’s Theorem): Let f : D ⊆ R2 → R, and let x0 ∈ D.If the second partial derivatives fxy and fyx are continuous on D, then theyare equal:

fxy(x0) = fyx(x0).

Example Let f(x, y) = sin(2x) cos2(4y). Then

fx = 2 cos2(4y) cos(2x), fy = −8 sin(2x) cos(4y) sin(4y),

which yields

fxy = (2 cos2(4y) cos(2x))y = −16 cos(2x) cos(4y) sin(4y)


and

fyx = (−8 sin(2x) cos(4y) sin(4y))x = −16 cos(2x) cos(4y) sin(4y),

and we conclude that these mixed partial derivatives are equal. 2

1.4.3 Techniques

We now describe the most practical techniques for computing partial deriva-tives. As mentioned previously, computing the partial derivative of a func-tion with respect to a given variable, at a given point, is equivalent to “freez-ing” the values of all other variables at that point, and then computing thederivative of the resulting function of one variable at that point.

However, generally it is most practical to compute the partial derivativeas a function of all of the independent variables, which can then be evaluatedat any point at which we wish to know the value of the partial derivative,just as when we have a function f(x), we normally compute its derivativeas a function f ′(x), and then evaluate that function at any point x0 wherewe want to know the rate of change.

Therefore, the most practical approach to computing a partial derivativeof a function f with respect to xi is to apply differentiation rules fromsingle-variable calculus to differentiate f with respect to xi, while treatingall other variables as constants. The result of this process is a function thatrepresents ∂f/∂xi(x1, x2, . . . , xn), and then values can be substituted for theindependent variables x1, x2, . . . , xn.

Example To compute fx(π/2, π) of f(x, y) = e−(x2+y2) sin 3x cos 4y, wetreat y as a constant, since we are differentiating with respect to x. Usingthe Product Rule and the Chain Rule from single-variable calculus, as wellas the rules for differentiating exponential and trigonometric functions, weobtain

fx(π/2, π) =∂

∂x[e−(x2+y2) sin 3x cos 4y]

∣∣∣∣x=π/2,y=π

= cos 4y∂

∂x[e−(x2+y2) sin 3x]

∣∣∣∣x=π/2,y=π

= cos 4y

[sin 3x

∂

∂x[e−(x2+y2)] + e−(x2+y2) ∂

∂x[sin 3x]

]∣∣∣∣x=π/2,y=π

= cos 4y

[e−(x2+y2) sin 3x

∂

∂x[−(x2 + y2)]+


3e−(x2+y2) cos 3x]∣∣∣x=π/2,y=π

= cos 4y[−2xe−(x2+y2) sin 3x+ 3e−(x2+y2) cos 3x

]∣∣∣x=π/2,y=π

= cos 4π[−2(π/2)e−((π/2)2+π2) sin(3π/2)+

3e−((π/2)2+π2) cos 3(π/2)]

= πe−5π2/4.

Similarly, to compute fy(π/2, π), we treat x as a constant, and apply thesedifferentiation rules to differentiate with respect to y. Finally, we substitutex = π/2 and y = π into the resulting derivative. 2

This approach to differentiation can also be applied to compute higher-orderpartial derivatives, as long as any substitution of values for the variables isdeferred to the end.

Example To evaluate the second partial derivatives of f(x, y) = ln |x+ y2|at x = 1, y = 2, we first compute the first partial derivatives of f :

fx =1

x+ y2

∂

∂x[x+ y2] =

1

x+ y2,

fy =1

x+ y2

∂

∂y[x+ y2] =

2y

x+ y2.

Next, we differentiate each of these partial derivatives with respect to bothx and y to obtain

fxx = (fx)x

=

(1

x+ y2

)x

= − 1

(x+ y2)2

∂

∂x[x+ y2]

= − 1

(x+ y2)2,

fxy = (fx)y

=

(1

x+ y2

)y

= − 1

(x+ y2)2

∂

∂y[x+ y2]


= − 2y

(x+ y2)2,

fyx = fxy

= − 2y

(x+ y2)2,

fyy = (fy)y

=

(2y

x+ y2

)y

=(x+ y2)(2y)y − 2y(x+ y2)y

(x+ y2)2

=2(x+ y2)− 4y2

(x+ y2)2

=2(x− y2)

(x+ y2)2.

Finally, we can evaluate these second partial derivatives at x = 1 and y = 2to obtain

fxx(1, 2) = − 1

25, fxy(1, 2) = fyx(1, 2) = − 4

25, fyy(1, 2) = − 6

25.

2

Example Let f(x, y, z) = x2y4z3. We will compute the second partialderivatives of this function at the point (x0, y0, z0) = (−1, 2, 3) by repeatedcomputation of first partial derivatives. First, we compute

fx = (x2y4z3)x = (x2)xy4z3 = 2xy4z3,

by treating y and z as constants, then

fy = (x2y4z3)y = (y4)yx2z3 = 4x2y3z3,

by treating x and z as constants, and then

fz = (x2y4z3)z = x2y4(z3)z = 3x2y4z2.

We then differentiate each of these with respect to x, y and z to obtain thesecond partial derivatives:

fxx = (fx)x = (2xy4z3)x = 2y4z3,

fxy = (fx)y = (2xy4z3)y = (2x)(4y3)(z3) = 8xy3z3,


fxz = (fx)z = (2xy4z3)z = (2xy4)(3z2) = 6xy4z2,

fyx = (fy)x = (4x2y3z3)x = 8xy3z3,

fyy = (fy)y = (4x2y3z3)y = (4x2)(3y2)(z3) = 12x2y2z3,

fyz = (fy)z = (4x2y3z3)z = (4x2y3)(3z2) = 12x2y3z2,

fzx = (fz)x = (3x2y4z2)x = 6xy4z2,

fzy = (fz)y = (3x2y4z2)y = (3x2)(4y3)(z2) = 12x2y3z2,

fzz = (fz)z = (3x2y4z2)z = (3x2y4)(2z) = 6x2y4z.

Then, these can be evaluated at (x0, y0, z0) by substituting x = −1, y = 2,and z = 3 to obtain

fxx(−1, 2, 3) = 864, fxy(−1, 2, 3) = −1728, fxz(−1, 2, 3) = −864,

fyx(−1, 2, 3) = −1728, fyy(−1, 2, 3) = 1296, fyz(−1, 2, 3) = 864,

fzx(−1, 2, 3) = −864, fzy(−1, 2, 3) = 864, fzz(−1, 2, 3) = 288.

Note that the order in which partial differentiation operations occur doesnot appear to matter; that is, fxy = fyx, for example. That is, Clairaut’sTheorem applies for any number of variables. It also applies to any order ofpartial derivative. For example,

fxyy = (fxy)y = (8xy3z3)y = 24xy2z3,

fyyx = (fyy)x = (12x2y2z3)x = 24xy2z3.

2

In single-variable calculus, implicit differentiation is applied to an equa-tion that implicitly describes y as a function of x, in order to compute dy/dx.The same approach can be applied to an equation that implicitly describesany number of dependent variables in terms of any number of independentvariables. The approach is the same as in the single-variable case: differ-entiate both sides of the equation with respect to the independent variable,leaving derivatives of dependent variables in the equation as unknowns. Theresulting equation can then be solved for the unknown partial derivatives.

Example Consider the equation

x2z + y2z + z2 = 1.

If we view this equation as one that implicitly describes z as a function of xand y, we can compute zx and zy using implicit differentiation with respectto x and y, respectively. Applying the Product Rule yields the equations

2xz + x2zx + y2zx + 2zzx = 0,

1.5. TANGENT PLANES, LINEAR APPROXIMATIONS ANDDIFFERENTIABILITY31

x2zy + 2yz + y2zy + 2zzy = 0,

which can then be solved for the partial derivatives to obtain

zx = − 2xz

x2 + y2 + 2z, zy = − 2yz

x2 + y2 + 2z.

2

1.5 Tangent Planes, Linear Approximations andDifferentiability

Now that we have learned how to compute partial derivatives of functions ofseveral independent variables, in order to measure their instantaneous ratesof change with respect to these variables, we will discuss another essentialapplication of derivatives: the approximation of functions by linear func-tions. Linear functions are the simplest to work with, and for this reason,there are many instances in which functions are replaced by a linear approx-imation in the context of solving a problem such as solving a differentialequation.

1.5.1 Tangent Planes and Linear Approximations

In single-variable calculus, we learned that the graph of a function f(x) canbe approximated near a point x0 by its tangent line, which has the equation

y = f(x0) + f ′(x0)(x− x0).

For this reason, the function Lf (x) = f(x0) + f ′(x0)(x− x0) is also referredto as the linearization, or linear approximation, of f(x) at x0.

Now, suppose that we have a function of two variables, f : D ⊆ R2 →R, and a point (x0, y0) ∈ D. Furthermore, suppose that the first partialderivatives of f , fx and fy, exist at (x0, y0). Because the graph of thisfunction is a surface, it follows that a linear function that approximates fnear (x0, y0) would have a graph that is a plane.

Just as the tangent line of f(x) at x0 passes through the point (x0, f(x0)),and has a slope that is equal to f ′(x0), the instantaneous rate of change off(x) with respect to x at x0, a plane that best approximates f(x, y) at(x0, y0) must pass through the point (x0, y0, f(x0, y0)), and the slope of theplane in the x- and y-directions, respectively, should be equal to the valuesof fx(x0, y0) and fy(x0, y0).


Since a general linear function of two variables can be described by theformula

Lf (x, y) = A(x− x0) +B(y − y0) + C,

so that Lf (x0, y0) = C, and a simple differentiation yields

∂Lf∂x

= A,∂Lf∂y

= B,

we conclude that the linear function that best approximates f(x, y) near(x0, y0) is the linear approximation

Lf (x, y) = f(x0, y0) +∂f

∂x(x0, y0)(x− x0) +

∂f

∂y(x0, y0)(y − y0).

Furthermore, the graph of this function is called the tangent plane of f(x, y)at (x0, y0). Its equation is

z − z0 =∂f

∂x(x0, y0)(x− x0) +

∂f

∂y(x0, y0)(y − y0).

Example Let f(x, y) = 2x2y+3y2, and let (x0, y0) = (1, 1). Then f(x0, y0) =5, and the first partial derivatives at (x0, y0) are

fx(1, 1) = 4xy|x=1,y=1 = 4, fy(1, 1) = 2x2 + 6y|x=1,y=1 = 8.

It follows that the tangent plane at (1, 1) has the equation

z − 5 = 4(x− 1) + 8(y − 1),

and the linearization of f at (1, 1) is

Lf (x, y) = 5 + 4(x− 1) + 8(y − 1).

Let (x, y) = (1.1, 1.1). Then f(x, y) = 6.292, while Lf (x, y) = 6.2, for anerror of 6.292− 6.2 = 0.092. However, if (x, y) = (1.01, 1.01), then f(x, y) =5.120902, while Lf (x, y) = 5.12, for an error of 5.120902− 5.12 = 0.000902.That is, moving 10 times as close to (1, 1) decreased the error by a factor ofover 100. 2

Another useful application of a linear approximation is to estimate theerror in the value of a function, given estimates of error in its inputs. Givena function z = f(x, y), and its linearization Lf (x, y) around a point (x0, y0),if x0 and y0 are measured values and dx = x − x0 and dz = y − y0 are


regarded as errors in x0 and y0, then the error in z can be estimated bycomputing

dz = z − z0 = Lf (x, y)− f(x0, y0)

= [f(x0, y0) + fx(x0, y0)(x− x0) + fy(x0, y0)(y − y0)]− f(x0, y0)

= fx(x0, y0) dx+ fy(x0, y0) dy.

The variables dx and dy are called differentials, and dz is called the totaldifferential, as it depends on the values of dx and dy. The total differentialdz is only an estimate of the error in z; the actual error is given by ∆z =f(x, y) − f(x0, y0), when the actual errors in x and y, ∆x = x − x0 and∆y = y−y0, are known. Since this is rarely the case in practice, one insteadestimates the error in z from estimates dx and dy of the errors in x and y.

Example Recall that the volume of a cylinder with radius r and height his V = πr2h. Suppose that r = 5 cm and h = 10 cm. Then the volume isV = 250π cm3. If the measurement error in r and h is at most 0.1 cm, then,to estimate the error in the computed volume, we first compute

Vr = 2πrh = 100π, Vh = πr2 = 25π.

It follows that the error in V is approximately

dV = Vr dr + Vh dh = 0.1(100π + 25π) = 12.5π cm3.

If we specify ∆r = 0.1 and ∆h = 0.1, and compute the actual volume usingradius r + ∆r = 5.1 and height h+ ∆h = 10.1, we obtain

V + ∆V = π(5.1)2(10.1) = 262.701π cm3,

which yields the actual error

∆V = 262.701π − 250π = 12.701π cm3.

Therefore, the estimate of the error, dV , is quite accurate. 2

1.5.2 Functions of More than Two Variables

The concepts of a tangent plane and linear approximation generalize tomore than two variables in a straightforward manner. Specifically, given

f : D ⊆ Rn → R and p0 = (x(0)1 , x

(0)2 , . . . , x

(0)n ) ∈ D, we define the tangent


space of f(x1, x2, . . . , xn) at p0 to be the n-dimensional hyperplane in Rn+1

whose points (x1, x2, . . . , xn, y) satisfy the equation

y−y0 =∂f

∂x1(p0)(x1−x(0)

1 )+∂f

∂x2(p0)(x2−x(0)

2 )+ · · ·+ ∂f

∂xn(p0)(xn−x(0)

n ),

where y0 = f(p0). Similarly, the linearization of f at p0 is the functionLf (x1, x2, . . . , xn) defined by

Lf (x1, x2, . . . , xn) = y0 +∂f

∂x1(p0)(x1 − x(0)

1 ) +∂f

∂x2(p0)(x2 − x(0)

2 ) +

· · ·+ ∂f

∂xn(p0)(xn − x(0)

n ).

1.5.3 The Gradient Vector

It can be seen from the above definitions that writing formulas that involvethe partial derivatives of functions of n variables can be cumbersome. Thiscan be addressed by expressing collections of partial derivatives of functionsof several variables using vectors and matrices, especially for vector-valuedfunctions of several variables.

By convention, a point p0 = (x(0)1 , x

(0)2 , . . . , x

(0)n ), which can be identified

with the position vector p0 = 〈x(0)1 , x

(0)2 , . . . , x

(0)n 〉, is considered to be a

column vector

p0 =

x

(0)1

x(0)2...

x(0)n

.Also, by convention, given a function of n variables, f : D ⊆ Rn → R,the collection of its partial derivatives with respect to all of its variables iswritten as a row vector

∇f(p0) =[

∂f∂x1

(p0) ∂f∂x2

(p0) · · · ∂f∂xn

(p0)].

This vector is called the gradient of f at p0.Viewing the partial derivatives of f as a vector allows us to use vector

operations to describe, much more concisely, the linearization of f . Specifi-cally, the linearization of f at p0, evaluated at a point p = (x1, x2, . . . , xn),can be written as

Lf (p) = f(p0) +∂f

∂x1(p0)(x1 − x(0)

1 ) +∂f

∂x2(p0)(x2 − x(0)

2 ) +


· · ·+ ∂f

∂xn(p0)(xn − x(0)

n )

= f(p0) +n∑i=1

∂f

∂xi(p0)(xi − x(0)

i )

= f(p0) +∇f(p0) · (p− p0),

where ∇f(p0) · (p−p0) is the dot product, also known as the inner product,of the vectors ∇f(p0) and p − p0. Recall that given two vectors u =〈u1, u2, . . . , un〉 and v = 〈v1, v2, . . . , vn〉, the dot product of u and v, denotedby u · v, is defined by

u · v =n∑i=1

uivi = u1v1 + u2v2 + · · ·+ unvn = ‖u‖‖v‖ cos θ,

where θ is the angle between u and v.

Example Let f : R3 → R be defined by

f(x, y, z) = 3x2y3z4.

Then

∇f(x, y, z) =[fx fy fz

]=[

6xy3z4 9x2y2z4 12x2y3z3].

Let (x0, y0, z0) = (1, 2,−1). Then

∇f(x0, y0, z0) = ∇f(1, 2,−1)

=[fx(1, 2,−1) fy(1, 2,−1) fz(1, 2,−1)

]=

[48 36 −96

].

It follows that the linearization of f at (x0, y0, z0) is

Lf (x, y, z) = f(1, 2,−1) +∇f(1, 2,−1) · 〈x− 1, y − 2, z + 1〉= 24 + 〈48, 36,−96〉 · 〈x− 1, y − 2, z + 1〉= 24 + 48(x− 1) + 36(y − 2)− 96(z + 1)

= 48x+ 36y − 96z − 192.

At the point (1.1, 1.9,−1.1), we have f(1.1, 1.9,−1.1) ≈ 36.5, while Lf (1.1, 1.9,−1.1) =34.8. Because f is changing rapidly in all coordinate directions at (1, 2,−1),it is not surprising that the linearization of f at this point is not highlyaccurate. 2


1.5.4 The Jacobian Matrix

Now, let f : D ⊆ Rn → Rm be a vector-valued function of n variables, withcomponent functions

f(p) =

f1(p)f2(p)

...fm(p)

,where each fi : D → Rm. Combining the two conventions described above,the partial derivatives of these component functions at a point p0 ∈ D arearranged in an m× n matrix

Jf (p0) =

∂f1∂x1

(p0) ∂f1∂x2

(p0) · · · ∂f1∂xn

(p0)∂f2∂x1

(p0) ∂f2∂x2

(p0) · · · ∂f2∂xn

(p0)... · · · · · ·

...∂fm∂x1

(p0) ∂fm∂x2

(p0) · · · ∂fm∂xn

(p0)

.This matrix is called the Jacobian matrix of f at p0. It is also referred toas the derivative of f at x0, since it reduces to the scalar f ′(x0) when f is ascalar-valued function of one variable. Note that rows of Jf (p0) correspondto component functions, and columns correspond to independent variables.This allows us to view Jf (p0) as the following collections of rows or columns:

Jf (p0) =

∇f1(p0)∇f2(p0)

...∇fm(p0)

=[

∂f∂x1

(p0) ∂f∂x2

(p0) · · · ∂f∂xn

(p0)].

The Jacobian matrix provides a concise way of describing the lineariza-tion of a vector-valued function, just the gradient does for a scalar-valuedfunction. The linearization of f at p0 is the function Lf (p), defined by

Lf (p) =

f1(p0)f2(p0)

...fm(p0)

+

∂f1∂x1

(p0)∂f2∂x1

(p0)...

∂fm∂x1

(p0)

(x1 − x(0)1 ) + · · ·

+

∂f1∂xn

(p0)∂f2∂xn

(p0)...

∂fm∂xn

(p0)

(xn − x(0)n )


= f(p0) +n∑j=1

∂f

∂xj(p0)(xj − x(0)

j )

= f(p0) + Jf (p0)(p− p0),

where the expression Jf (p0)(p − p0) involves matrix multiplication of thematrix Jf (p0) and the vector p − p0. Note the similarity between thisdefinition, and the definition of the linearization of a function of a singlevariable.

In general, given a m × n matrix A; that is, a matrix A with m rowsand n columns, and an n×p matrix B, the product AB is the m×p matrixC, where the entry in row i and column j of C is obtained by computingthe dot product of row i of A and column j of B. When computing thelinearization of a vector-valued function f at the point p0 in its domain, theith component function of the linearization is obtained by adding the valueof the ith component function at p0, fi(p0), to the dot product of ∇fi(p0)and the vector p−p0, where p is the vector at which the linearization is tobe evaluated.

Example Let f : R2 → R2 be defined by

f(x, y) =

[f1(x, y)f2(x, y)

]=

[ex cos ye−2x sin y

].

Then the Jacobian matrix, or derivative, of f is the 2× 2 matrix

Jf (x, y) =

[∇f1(x, y)∇f2(x, y)

]=

[(f1)x (f1)y(f2)x (f2)y

]=

[ex cos y −ex sin y

−2e−2x sin y e−2x cos y

].

Let (x0, y0) = (0, π/4). Then we have

Jf (x0, y0) =

[ √2

2 −√

22

−√

2√

22

],

and the linearization of f at (x0, y0) is

Lf (x, y) =

[f1(x0, y0)f2(x0, y0)

]+ Jf (x0, y0)

[x− x0

y − y0

]=

[ √2

2√2

2

]+

[ √2

2 −√

22

−√

2√

22

] [x− 0y − π

4

]

=

[ √2

2 +√

22 x−

√2

2

(y − π

4

)√

22 −

√2x+

√2

2

(y − π

4

) ] .


At the point (x1, y1) = (0.1, 0.8), we have

f(x1, y1) ≈[

0.769980.58732

], Lf (x1, y1) ≈

[0.767490.57601

].

Because of the relatively small partial derivatives at (x0, y0), the lineariza-tion at this point yields a fairly accurate approximation at (x1, y1). 2

1.5.5 Differentiability

Before using a linearization to approximate a function near a point p0, it ishelpful to know whether this linearization is actually an accurate approxi-mation of the function in the first place. That is, we need to know if thefunction is differentiable at p0, which, informally, means that its instanta-neous rate of change at p0 is well-defined. In the single-variable case, afunction f(x) is differentiable at x0 if f ′(x0) exists; that is, if the limit

f ′(x0) = limx→x0

f(x)− f(x0)

x− x0

exists. In other words, we must have

limx→x0

f(x)− f(x0)− f ′(x0)(x− x0)

x− x0= 0.

But f(x0) + f ′(x0)(x− x0) is just the linearization of f at x0, so we cansay that f is differentiable at x0 if and only if

limx→x0

f(x)− Lf (x)

x− x0= 0.

Note that this is a stronger statement than simply requiring that

limx→x0

f(x)− Lf (x) = 0,

because as x approaches x0, |1/(x − x0)| approaches ∞, so the differencef(x)−Lf (x) must approach zero particularly rapidly in order for the fraction[f(x)−Lf (x)]/(x−x0) to approach zero. That is, the linearization must bea sufficiently accurate approximation of f near x0 for this to be the case, inorder for f to be differentiable at x0.

This notion of differentiability is readily generalized to functions of sev-eral variables. Given f : D ⊆ Rn → Rm, and p0 ∈ D, we say that f isdifferentiable at p0 if

limp→p0

‖f(p)− Lf (p)‖‖p− p0‖

= 0,


where Lf (p) is the linearization of f at p0.

Example Let f(x, y) = x2y. To verify that this function is differentiable at(x0, y0) = (1, 1), we first compute fx = 2xy and fy = x2. It follows that thelinearization of f at (1, 1) is

Lf (x, y) = f(1, 1) + fx(1, 1)(x− 1) + fy(1, 1)(y − 1)

= 1 + 2(x− 1) + (y − 1) = 2x+ y − 2.

Therefore, f is differentiable at (1, 1) if

lim(x,y)→(1,1)

|x2y − (2x+ y − 2)|‖(x, y)− (1, 1)‖

= lim(x,y)→(1,1)

|x2y − (2x+ y − 2)|√(x− 1)2 + (y − 1)2

= 0.

By rewriting this expression as

|x2y − (2x+ y − 2)|√(x− 1)2 + (y − 1)2

=|x− 1||y(x+ 1)− 2|√

(x− 1)2 + (y − 1)2,

and noting that

lim(x,y)→(1,1)

|y(x+ 1)− 2| = 0, 0 ≤ |x− 1|√(x− 1)2 + (y − 1)2

≤ 1,

we conclude that the limit actually is zero, and therefore f is differentiable.2

There are three important conclusions that we can make regarding dif-ferentiable functions:

• If all partial derivatives of f at p0 exist, and are continuous, then f isdifferentiable at p0.

• Furthermore, if f is differentiable at p0, then it is continuous at p0.Note that the converse is not true; for example, f(x) = |x| is continu-ous at x = 0, but it is not differentiable there, because f ′(x) does notexist there.

• If f is differentiable at p0, then its first partial derivatives exist atp0. This statement might seem redundant, because the first partialderivatives are used in the definition of the linearization, but it isimportant nonetheless, because the converse of this statement is nottrue. That is, if a function’s first partial derivatives exist at a point,it is not necessarily differentiable at that point.


The notion of differentiability is related to not only partial derivatives, whichonly describe how a function changes as one of its variables changes, but alsothe instantaneous rate of change of a function as its variables change alongany direction. If a function is differentiable at a point, that means its rate ofchange along any direction is well-defined. We will explore this idea furtherlater in this chapter.

1.6 The Chain Rule

Recall from single-variable calculus that if a function g(x) is differentiable atx0, and f(x) is differentiable at g(x0), then the derivative of the composition(f ◦ g)(x) = f(g(x)) is given by the Chain Rule

(f ◦ g)′(x0) = f ′(g(x0))g′(x0).

We now generalize the Chain Rule to functions of several variables. Letf : D ⊆ Rn → Rm, and let g : U ⊆ Rp → D. That is, the range of g is thedomain of f .

Assume that g is differentiable at a point p0 ∈ U , and that f is differen-tiable at the point q0 = g(p0). Then, f has a Jacobian matrix Jf (q0), andg has a Jacobian matrix Jg(p0). These matrices contain the first partialderivatives of f and g evaluated at q0 and p0, respectively.

Then, the Chain Rule states that the derivative of the composition (f ◦g) : U → Rm, defined by (f ◦g)(x) = f(g(x)), at p0, is given by the Jacobianmatrix

Jf◦g(p0) = Jf (g(p0))Jg(p0).

That is, the derivative of f ◦ g at p0 is the product, in the sense of matrixmultiplication, of the derivative of f at g(p0) and the derivative of g at p0.This is entirely analogous to the Chain Rule from single-variable calculus,in which the derivative of f ◦ g at x0 is the product of the derivative of f atg(x0) and the derivative of g at x0.

It follows from the rules of matrix multiplication that the partial deriva-tive of the ith component function of f ◦ g with respect to the variable xj ,an independent variable of g, is given by the dot product of the gradient ofthe ith component function of f with the vector that contains the partialderivatives of the component functions of g with respect to xj . We nowillustrate the application of this general Chain Rule with some examples.


f(x, y, z) = ez cos 2x sin 3y,

1.6. THE CHAIN RULE 41

and let g : R→ R3 be a vector-valued function of one variable defined by

g(t) = 〈x(t), y(t), z(t)〉 = 〈2t, t2, t3〉.

Then, f ◦ g is a scalar-valued function of t,

(f ◦ g)(t) = ez(t) cos 2x(t) sin 3y(t) = et3

cos 4t sin 3t2.

To compute its derivative with respect to t, we first compute

∇f =[fx fy fz

]=[−2ez sin 2x sin 3y 3ez cos 2x cos 3y ez cos 2x sin 3y

],

and

g′(t) = 〈x′(t), y′(t), z′(t)〉 = 〈2, 2t, 3t2〉,

and then apply the Chain Rule to obtain

df

dt= ∇f(x(t), y(t), z(t)) · g′(t)

=[fx(x(t), y(t), z(t)) fy(x(t), y(t), z(t)) fz(x(t), y(t), z(t))

] dxdtdydtdzdt

= fx(x(t), y(t), z(t))

dx

dt+ fy(x(t), y(t), z(t))

dy

dt+ fz(x(t), y(t), z(t))

fz

dt

= (−2ez(t) sin 2x(t) sin 3y(t))(2) + (3ez(t) cos 2x(t) cos 3y(t))(2t) +

(ez(t) cos 2x(t) sin 3y(t))(3t2)

= −4et3

sin 4t sin 3t2 + 6tet3

cos 4t cos 3t2 + 3t2et3

cos 4t sin 3t2.

2


f(x, y) = x2y + xy2,

and let g : R2 → R2 be defined by

g(s, t) =

[x(s, t)y(s, t)

]=

[2s+ ts− 2t

].

Then, f ◦ g is a scalar-valued function of s and t,

(f◦g)(s, t) = x(s, t)2y(s, t)+x(s, t)y(s, t)2 = (2s+t)2(s−2t)+(2s+t)(s−2t)2.


To compute its gradient, which includes its partial derivatives with respectto s and t, we first compute

∇f =[fx fy

]=[

2xy + y2 x2 + 2xy],

and

Jg(s, t) =

[xs xtys yt

]=

[2 11 −2

],

and then apply the Chain Rule to obtain

∇(f ◦ g)(s, t) = ∇f(x(s, t), y(s, t))Jg(s, t)

=[fx(x(s, t), y(s, t)) fy(x(s, t), y(s, t))

] [ xs xtys yt

]=

[fx(x(s, t), y(s, t))xs + fy(x(s, t), y(s, t))ys

fx(x(s, t), y(s, t))xt + fy(x(s, t), y(s, t))yt]

=[

[2x(t)y(t) + y(t)2](2) + [x(t)2 + 2x(t)y(t)](1)

[2x(t)y(t) + y(t)2](1) + [x(t)2 + 2x(t)y(t)](−2)]

=[

4(2s+ t)(s− 2t) + 2(s− 2t)2 + (2s+ t)2 + 2(2s+ t)(s− 2t)

2(2s+ t)(s− 2t) + (s− 2t)2 − 2(2s+ t)2 − 4(2s+ t)(s− 2t)].

2

Example Let f : R→ R be defined by

f(x) = x3 + 2x2,

and let g : R2 → R be defined by

g(u, v) = sinu cos v.

Then f ◦ g is a scalar-valued function of u and v,

(f ◦ g)(u, v) = (sinu cos v)3 + 2(sinu cos v)2.

To compute its gradient, which includes partial derivatives with respect tou and v, we first compute

f ′(x) = 3x2 + 4x,

and∇g =

[gu gv

]=[

cosu cos v − sinu sin v],


and then use the Chain Rule to obtain

∇(f ◦ g)(u, v) = f ′(g(u, v))∇g(u, v)

= [3(g(u, v))2 + 4g(u, v)][

cosu cos v − sinu sin v]

= [3 sin2 u cos2 v + 4 sinu cos v][

cosu cos v − sinu sin v].

2

Example Let f : R2 → R2 be defined by

f(x, y) =

[f1(x, y)f2(x, y)

]=

[x2yxy2

],

and let g : R→ R2 be defined by

g(t) = 〈x(t), y(t)〉 = 〈cos t, sin t〉.

Then f ◦ g is a vector-valued function of t,

f(t) = 〈cos2 t sin t, cos t sin2 t〉.

To compute its derivative with respect to t, we first compute

Jf (x, y) =

[(f1)x (f1)y(f2)x (f2)y

]=

[2xy x2

y2 2xy

],

and g′(t) = 〈− sin t, cos t〉, and then use the Chain Rule to obtain

(f ◦ g)′(t) = Jf (x(t), y(t))g′(t) =

[(f1)x(x(t), y(t)) (f1)y(x(t), y(t))(f2)x(x(t), y(t)) (f2)y(x(t), y(t))

] [x′(t)y′(t)

]=

[2x(t)y(t) x(t)2

y(t)2 2x(t)y(t)

] [− sin tcos t

]= 〈2 cos t sin t(− sin t) + cos2 t(cos t), sin2 t(− sin t) + 2 cos t sin t(cos t)〉= 〈−2 cos t sin2 t+ cos3 t,− sin3 t+ 2 cos2 t sin t〉.

2

1.6.1 The Implicit Function Theorem

The Chain Rule can also be used to compute partial derivatives of implic-itly defined functions in a more convenient way than is provided by implicitdifferentiation. Let the equation F (x, y) = 0 implicitly define y as a dif-ferentiable function of x. That is, y = f(x) where F (x, f(x)) = 0 for x in


the domain of f . If F is differentiable, then, by the Chain Rule, we candifferentiate the equation F (x, y(x)) = 0 with respect to x and obtain

Fx + Fydy

dx= 0,

which yieldsdy

dx= −Fx

Fy.

By the Implicit Function Theorem, the equation F (x, y) = 0 defines y im-plicitly as a function of x near (x0, y0), where F (x0, y0) = 0, provided thatFy(x0, y0) 6= 0 and Fx and Fy are continuous near (x0, y0). Under theseconditions, we see that dy/dx is defined at (x0, y0) as well.

Example Let F : R2 → R be defined by

F (x, y) = x2 + y2 − 4.

The equation F (x, y) = 0 defines y implicitly as a function of x, providedthat F satisfies the conditions of the Implicit Function Theorem.

We have

Fx = 2x, Fy = 2y.

Since both of these partial derivatives are polynomials, and therefore arecontinuous on all of R2, it follows that if Fy 6= 0, then y can be implicitlydefined as a function of x at a point where F (x, y, z) = 0, and

dy

dx= −Fx

Fy= −x

y.

For example, at the point (x, y) = (0, 2), F (x, y) = 0, and Fy = 4. Therefore,y can be implicitly defined as a function of x near this point, and at x = 0,we have dy/dx = 0. 2

More generally, let F : D ⊆ Rn+1 → R, and let p0 = (x(0)1 , x

(0)2 , . . . , x

(0)n , y(0)) ∈

D be such that F (x(0)1 , x

(0)2 , . . . , x

(0)n , y(0)) = 0. In this case, the Implicit

Function Theorem states that if Fy 6= 0 near p0, and all first partial deriva-tives of F are continuous near p0, then this equation defines y as a functionof x1, x2, . . . , xn, and

∂y

∂xi= −Fxi

Fy, i = 1, 2, . . . , n.


To see this, we differentiate the equation F (x(0)1 , x

(0)2 , . . . , x

(0)n , y(0)) = 0 with

respect to xi to obtain the equation

Fxi + Fy∂y

∂xi= 0,

where all partial derivatives are evaluated at p0, and solve for ∂y/∂xi at p0.

Example Let F : R3 → R be defined by

F (x, y, z) = x2z + z2y − 2xyz + 1.

The equation F (x, y, z) = 0 defines z implicitly as a function of x and y,provided that F satisfies the conditions of the Implicit Function Theorem.

We have

Fx = 2xz − 2yz, Fy = 2yz − 2xz, Fz = x2 + 2yz − 2xy.

Since all of these partial derivatives are polynomials, and therefore are con-tinuous on all of R3, it follows that if Fz 6= 0, then z can be implicitly definedas a function of x and y at a point where F (x, y, z) = 0, and

zx = −FxFz

=2yz − 2xz

x2 + 2yz − 2xy, zy = −Fy

Fz=

2xz − 2yz

x2 + 2yz − 2xy.

For example, at the point (x, y, z) = (1, 0,−1), F (x, y, z) = 0, and Fz = 1.Therefore, z can be implicitly defined as a function of x and y near thispoint, and at (x, y) = (1, 0), we have zx = 2 and zy = −2. 2

We now consider the most general case: let F : D ⊆ Rn+m → Rm, andlet

p0 = (x(0)1 , x

(0)2 , . . . , x(0)

n , y(0)1 , y

(0)2 , . . . , y(0)

m ) ∈ D

be such that

F(x(0)1 , x

(0)2 , . . . , x(0)

n , y(0)1 , y

(0)2 , . . . , y(0)

m ) = 0.

If we differentiate this system of equations with respect to xi, we obtain thesystems of linear equations

Fxi + Fy1∂y1

∂xi+ Fy2

∂y2

∂xi+ · · ·+ Fym

∂ym∂xi

= 0, i = 1, 2, . . . , n,

where all partial derivatives are evaluated at p0.


To examine the solvability of these systems of equations, we first define

x0 = (x(0)1 , x

(0)2 , . . . , x

(0)n ), and denote the component functions of the vector-

valued function F by F = 〈F1, F2, . . . , Fm〉. We then define the Jacobianmatrices

Jx,F(p0) =

∂F1∂x1

(p0) ∂F1∂x2

(p0) · · · ∂F1∂xn

(p0)∂F2∂x1

(p0) ∂F2∂x2

(p0) · · · ∂F2∂xn

(p0)... · · · · · ·

...∂Fm∂x1

(p0) ∂Fm∂x2

(p0) · · · ∂Fm∂xn

(p0)

,

Jy,F(p0) =

∂y1∂y1

(p0) ∂F1∂y2

(p0) · · · ∂F1∂ym

(p0)∂y2∂y1

(p0) ∂F2∂y2

(p0) · · · ∂F2∂ym

(p0)... · · · · · ·

...∂ym∂y1

(p0) ∂Fm∂y2

(p0) · · · ∂Fm∂ym

(p0)

,and

Jy(x0) =

∂y1∂x1

(x0) ∂y1∂x2

(x0) · · · ∂y1∂xn

(x0)∂y2∂x1

(x0) ∂y2∂x2

(x0) · · · ∂y1∂xn

(x0)... · · · · · ·

...∂ym∂x1

(x0) ∂ym∂x2

(x0) · · · ∂y1∂xn

(x0)

.Then, from our previous differentiation with respect to xi, for each i =1, 2, . . . , n, we can concisely express our systems of equations as a singlesystem

Jx,F(p0) + Jy,F(p0)Jy(x0) = 0.

If the matrix Jy,F(p0) is invertible (also nonsingular), which is the caseif and only if its determinant is nonzero, and if all first partial derivativesof F are continuous near p0, then the equation F(p) = 0 implicitly definesy1, y2, . . . , ym as a function of x1, x2, . . . , xn, and

Jy(x0) = −[Jy,F(p0)]−1Jx,F(p0),

where [Jy,F(p0)]−1 is the inverse of the matrix Jy,F(p0).

Example Let F : R4 → R2 by defined by

F(x, y, s, t) =

[F1(x, y, u, v)F2(x, y, u, v)

]=

[xu+ y2v

x2v + yu+ 1

].

Then the vector equation F(x, y, u, v) = 0 implicitly defines (u, v) as afunction of (x, y), provided that F satisifes the conditions of the Implicit

1.7. DIRECTIONAL DERIVATIVES AND THE GRADIENT VECTOR47

Function Theorem. We will compute the partial derivatives of u and v withrespect to x and y, at a point that satisfies this equation.

We have

J(x,y),F(x, y, u, v) =

[∂F1∂x

∂F1∂y

∂F2∂x

∂F2∂y

]=

[u 2yv

2xv u

],

J(u,v),F(x, y, u, v) =

[∂F1∂u

∂F1∂v

∂F2∂u

∂F2∂v

]=

[x y2

y x2

].

From the formula for the inverse of a 2× 2 matrix,[a bc d

]−1

=1

ad− bc

[d −b−c a

],

we obtain

J(u,v)(x, y) =

[ux uyvx vy

]= −[J(u,v),F(x, y, u, v)]−1J(x,y),F(x, y, u, v)

= − 1

x3 − y3

[x2 −y2

−y x

] [u 2yv

2xv u

]=

1

y3 − x3

[x2u− 2xy2v 2x2yv − y2u2x2v − yu xu− 2y2v

].

These partial derivatives can then be evaluated at any point (x, y, u, v) suchthat F(x, y, u, v) = 0, such as (x, y, u, v) = (0, 1, 0,−1). Note that thematrix J(u,v),F(x, y, u, v) is not invertible (that is, singular) if its determinantx3 − y3 = 0; that is, if x = y. When this is the case, (u, v) can not beimplicitly defined as a function of (x, y). 2

1.7 Directional Derivatives and the Gradient Vec-tor

Previously, we defined the gradient as the vector of all of the first partialderivatives of a scalar-valued function of several variables. Now, we willlearn about how to use the gradient to measure the rate of change of thefunction with respect to a change of its variables in any direction, as opposedto a change in a single variable. This is extremely useful in applications inwhich the minimum or maxmium value of a function is sought. We willalso learn how the gradient can be used to easily describe tangent planes tolevel surfaces, thus providing an alternative to implicit differentiation or theChain Rule.


1.7.1 The Gradient Vector

Let f : D ⊆ Rn → R be a scalar-valued function of n variables x1, x2, . . . , xn.Recall that the vector of its first partial derivatives,

∇f =[fx1 fx2 · · · fxn

],

is called the gradient of f .

Example Let f(x, y, z) = e−(x2+y2) cos z. Then

∇f =[−2xe−(x2+y2) cos z −2ye−(x2+y2) cos z −e−(x2+y2) sin z

].

Therefore, at the point (x0, y0, z0) = (1, 2, π/3), the gradient is the vector

∇f(x0, y0, z0) =[fx(1, 2, π/3) fy(1, 2, π/3) fz(1, 2, π/3)

]=

⟨−e−5,−2e−5,−

√3

2e−5

⟩.

2

It should be noted that various differentiation rules from single-variablecalculus have direct generalizations to the gradient. Let u and v be differ-entiable functions defined on Rn. Then, we have:

• Linearity:

∇(au+ bv) = a∇u+ b∇v

where a and b are constants

• Product Rule:

∇(uv) = u∇v + v∇u

• Quotient Rule:

∇(uv

)=v∇u− u∇v

v2

• Power Rule:

∇un = nun−1∇u


1.7.2 Directional Derivatives

The components of the gradient vector ∇f represent the instantaneous ratesof change of the function f with respect to any one of its independent vari-ables. However, in many applications, it is useful to know how f changes asits variables change along any path from a given point. To that end, givenf : D ⊆ R2 → R, and a unit vector u = 〈a, b〉 ∈ R2, we define the directionalderivative of f at (x0, y0) ∈ D in the direction of u to be

Duf(x0, y0) = limh→0

f(x0 + ah, y0 + bh)− f(x0, y0)

h.

When u = i = 〈1, 0〉, then Duf = fx, and when u = j = 〈0, 1〉, thenDuf = fy. For general u, Duf(x0, y0) represents the instantaneous rate ofchange of f as (x, y) change in the direction of u from the point (x0, y0).

Because it is cumbersome to compute a directional derivative using thedefinition directly, it is desirable to be able to relate the directional derivativeto the partial derivatives, which can be computed easily using differentiationrules. We have

Duf(x0, y0) = limh→0

f(x0 + ah, y0 + bh)− f(x0, y0)

h

= limh→0

f(x0 + ah, y0 + bh)− f(x0, y0 + bh) + f(x0, y0 + bh)− f(x0, y0)

h

= limh→0

f(x0 + ah, y0 + bh)− f(x0, y0 + bh)

h+

f(x0, y0 + bh)− f(x0, y0)

h

= limh→0

f(x0 + ah, y0 + bh)− f(x0, y0 + bh)

aha+

f(x0, y0 + bh)− f(x0, y0)

bhb

= fx(x0, y0)a+ fy(x0, y0)b

= ∇f(x0, y0) · u.

That is, the directional derivative in the direction of u is the dot product ofthe gradient with u. It can be shown that this is the case for any number ofvariables: given f : D ⊆ Rn → R, and a unit vector u ∈ Rn, the directionalderivative of f at x0 ∈ Rn in the direction of u is given by

Duf(x0) = ∇f(x0) · u.


Because the dot product a · b can also be defined as

a · b = ‖a‖‖b‖ cos θ,

where θ is the angle between a and b, the directional derivative can be usedto determine the direction along which f increases most rapidly, decreasesmost rapidly, or does not change at all.

We first note that if θ is the angle between ∇f(x0) and u, then

Duf(x0) = ∇f(x0) · u = ‖∇f(x0)‖ cos θ.

Then we have the following:

• When θ = 0, cos θ = 1, so Duf is maximized, and its value is‖∇f(x0)‖. In this case,

u =∇f(x0)

‖∇f(x0)‖,

and this is called the direction of steepest ascent.

• When θ = π, cos θ = −1, so Duf is minimized, and its value is−‖∇f(x0)‖. In this case,

u = − ∇f(x0)

‖∇f(x0)‖,

and this is called the direction of steepest descent.

• When θ = ±π/2, cos θ = 0, so Du = 0. In this case, u is a unit vectorthat is orthogonal (perpendicular) to ∇f(x0). Since f is not changingat all along this direction, it follows that u indicates the direction ofa level set of f , on which f(x) = f(x0).

The direction of steepest descent is of particular interest in applications inwhich the goal is to find the minimum value of f . From a starting pointx0, one can choose a new point x1 = x0 + αu, where u = −∇f(x0) is thedirection of steepest descent, by choosing α so as to minimize f(x1). Then,this process can be repeated using the direction of steepest descent at x1,which is −∇f(x1), to compute a new point x2, and so on, until a minimumis found. This process is called the method of steepest descent.

While not used very often in practice, it serves as a useful buildingblock for some of the most powerful methods that are used in practice forminimizing functions.


Example Let f(x, y) = x2y + y3, and let (x0, y0) = (2,−2). Then

∇f(x, y) =[fx(x, y) fy(x, y)

]=[

2xy x2 + 3y2],

which yields ∇f(x0, y0) = 〈fx(2,−2), fy(2,−2)〉 = 〈−8, 16〉. It follows thatthe direction of steepest ascent is

u =∇f(2,−2)

‖∇f(2,−2)‖=

〈−8, 16〉√(−8)2 + 162

=〈−8, 16〉√

320=〈−8, 16〉

8√

5=

⟨− 1√

5,

2√5

⟩.

For this u, we have Duf(2,−2) = ‖∇f(2,−2)‖ = 8√

5.

Furthermore, the direction of steepest descent is

u =

⟨1√5,− 2√

5

⟩,

and along this direction, we have Duf(2,−2) = −‖∇f(2,−2)‖ = −8√

5.Finally, the directions along which f does not change at all are those thatare orthogonal to the directions of steepest ascent and descent,

u = ±⟨

2√5,

1√5

⟩.

The level curve defined by the equation f(x, y) = f(2,−2) = −16 proceedsalong these directions from the point (2,−2). 2

1.7.3 Tangent Planes to Level Surfaces

Let F : D ⊆ R3 → R be a function of three variables x, y and z thatimplicitly defines a surface through the equation F (x, y, z) = 0, and let(x0, y0, z0) be a point on that surface. If F satisfies the conditions of theImplicit Function Theorem at (x0, y0, z0), then the equation of the planethat is tangent to the surface at this point can be obtained using the factthat z is implicitly defined as a function of x and y near this point. It thenfollows that the equation of the tangent plane is

z − z0 = zx(x0, y0)(x− x0) + zy(x0, y0)(y − y0),

where, by the Chain Rule,

zx(x0, y0) = −Fx(x0, y0, z0)

Fz(x0, y0, z0), zy(x0, y0) = −Fy(x0, y0, z0)

Fz(x0, y0, z0).


This is not possible if Fz(x0, y0, z0) = 0, because then the Implicit FunctionTheorem does not apply.

It would be desirable to be able to obtain the equation of the tangentplane even if Fz(x0, y0, z0) = 0, because the level surface still has a tangentplane at that point even if z cannot be implicitly defined as a function of xand y. To that end, we note that any direction u within the tangent planeis parallel to the tangent vector of some curve that lies within the surfaceand passes through (x0, y0, z0). Because F (x, y, z) = 0 on this surface, itfollows that DuF (x0, y0, z0) = 0. However, this implies that ∇F (x0, y0, z0)must be orthogonal to u, in view of

DuF (x0, y0, z0) = ∇F (x0, y0, z0) · u = 0.

Since this is the case for any direction u within the tangent plane, we con-clude that ∇F (x0, y0, z0) is normal to the tangent plane, and therefore theequation of this plane is

Fx(x0, y0, z0)(x− x0) + Fy(x0, y0, z0)(y − y0) + Fz(x0, y0, z0)(z − z0) = 0.

Note that this equation is equivalent to that obtained using the Chain Rule,when Fz(x0, y0, z0) 6= 0.

The gradient not only provides the normal vector to the tangent plane,but also the direction numbers of the normal line to the surface at (x0, y0, z0),which is the line that passes through the surface at this point and is perpen-dicular to the tangent plane. The equation of this line, in parametric form,is

x = x0 + tFx(x0, y0, z0), y = y0 + tFy(x0, y0, z0), z = z0 + tFz(x0, y0, z0).

Example Let F (x, y, z) = x2 + y2 + z2 − 2x − 4y − 4. Then the equationF (x, y, z) = 0 defines a sphere of radius 3 centered at (1, 2, 0). At the point(x0, y0, z0) = (3, 3, 2), we have

∇F (x0, y0, z0) =[Fx(x0, y0, z0) Fy(x0, y0, z0) Fz(x0, y0, z0)

]=

[2x0 − 2 2y0 − 4 2z0

]= 〈4, 2, 4〉.

It follows that the equation of the plane that is tangent to the sphere at(3, 3, 2) is

4(x− x0) + 2(y − y0) + 4(z − z0) = 0,

1.8. MAXIMUM AND MINIMUM VALUES 53

and the equation of the normal line, in parametric form, is

x = x0 + tFx(x0, y0, z0) = 3 + 4t,

y = y0 + tFy(x0, y0, z0) = 3 + 2t,

z = z0 + tFz(x0, y0, z0) = 2 + 4t.

Equivalently, we can describe the normal line using its symmetric equations,

x− 3

4=y − 3

2=z − 2

4.

2

1.8 Maximum and Minimum Values

In single-variable calculus, one learns how to compute maximum and mini-mum values of a function. We first recall these methods, and then we willlearn how to generalize them to functions of several variables.

Let f : D ⊆ Rn → R. A local maximum of a function f is a pointa ∈ D such that f(x) ≤ f(a) for x near a. The value f(a) is called a localmaximum value. Similarly, f has a local minimum at a if f(x) ≥ f(a) for xnear a, and the value f(a) is called a local minimum value.

When a function of a single variable, f(x), has a local maximum orminimum at x = a, then a must be a critical point of f , which means thatf ′(c) = 0, or f ′ does not exist at a (which is the case if, for example, thegraph of f has a sharp corner at a). In general, if f is differentiable at apoint a, then in order for a to be a local maximum or minimum of f , therate of change of f , as its independent variables change in any direction,must be zero. The only way to ensure this is to require that ∇f(a) = 0.Therefore, we say that a is a critical point if ∇f(a) = 0 or if any partialderivative of f does not exist at a.

Once we have found the critical points of a function, we must determinewhether they correspond to local maxima or minima. In the single-variablecase, we can use the Second Derivative Test, which states that if a is a criticalpoint of f , and f ′′(a) > 0, then a is a local minimum, while if f ′′(a) < 0, ais a local maximum, and if f ′′(a) = 0, the test is inconclusive.

This test is generalized to the multivariable case as follows: first, weform the Hessian, which is the matrix of second partial derivatives at a. Iff is a function of n variables, then the Hessian is an n × n matrix H, andthe entry in row i, column j of H is defined by

Hij =∂2f

∂xi∂xj(a).


Because mixed second partial derivatives are equal if they are continuous,it follows that H is a symmetric matrix, meaning that Hij = Hji.

We can now state the Second Derivatives Test. If a is a critical pointof f , and the Hessian, H, is positive definite, then a is a local minimumof a. The notion of a matrix being positive definite is the generalization tomatrices of the notion of a positive number. When a matrix H is symmetric,the following statements are all equivalent:

• H is positive definite.

• xTHx > 0, where x is a nonzero column vector of real numbers, andxT is the transpose of x, which is a row vector.

• The eigenvalues of H are positive.

• The determinant of H is positive.

• The diagonal entries of H, Hii for i = 1, 2, . . . , n, are positive.

On the other hand, if H is negative definite, then f has a local maxi-mum at a. This means that xTHx < 0 for any nonzero real vector x, andthat the eigenvalues and diagonal entries of H are negative. However, thedeterminant is not necessarily negative. Because it is equal to the productof the eigenvalues, the determinant is positive of n is even, and negative ifn is odd.

If H is indefinite, which is the case if it is neither positive definite nornegative definite, and therefore has both positive and negative eigenvalues,then we say that f has a saddle point at a. This means that the graph of fcrosses its tangent plane at a, and the term “saddle point” arises from thefact that f is increasing from a along some directions, but decreasing alongothers.

Finally, if H is a singular matrix, meaning that one of its eigenvalues, andtherefore its determinant, is equal to zero, the test is inconclusive. There-fore, a could be a local minimum, local maximum, saddle point, or none ofthe above. One must instead use other information about f , such as its di-rectional derivatives, to determine if f has a maximum, minimum or saddlepoint at a.


f(x, y) = 6x2 + 4xy + 8y2 − x− 3y.


We wish to find any local minima or maxima of this function. First, wecompute its gradient,

∇f =[

12x+ 4y − 1 4x+ 16y − 3].

To determine where ∇f = 0, we must solve the system of linear equations

12x+ 4y = 1,

4x+ 16y = 3.

Using the second equation to obtain x = (3 − 16y)/4 and substituting thisinto the first equation, we obtain y = 2/11 and x = 1/44. Since the solutionof this system is unique, it follows that this is the only critical point of f .

To determine whether this critical point corresponds to a maximum orminimum, we must compute the Hessian H, whose entries are the secondpartial derivatives of f at (1/44, 2/11). We have

H =

[fxx fxyfyx fyy

]=

[12 44 16

].

To determine whether this matrix is positive definite, we first compute itsdeterminant,

det(H) = fxxfyy − f2xy = 12(16)− 4(4) = 176.

Since the determinant, which is the product of H’s two eigenvalues, is pos-itive, it follows that they must both be the same sign. To determine thatsign, we check the trace of H, denoted by tr(H). The trace of a matrix isthe sum of its diagonal entries, which is also the sum of the eigenvalues. Wehave

tr(H) = fxx + fyy = 12 + 16 = 28.

Since both eigenvalues are the same sign, and their sum is positive, theymust both be positive. Therefore, H is positive definite, and we concludethat (2/11, 1/44) is a local minimum of f . 2

The preceding example describes how the Second Derivatives Test can beperformed for a function of two variables:

• If det(H) = fxxfyy − f2xy > 0, and fxx > 0, then the critical point is a

minimum.

• If det(H) > 0 and fxx < 0, then the critical point is a maximum.


• If det(H) < 0, then the critical point is a saddle point.

• If det(H) = 0, then the test is inconclusive.

In many applications, it is desirable to know where a function assumes itslargest or smallest values, not just among nearby points, but within its entiredomain. We say that a function f : D ⊆ Rn → R has an absolute maximumat a if f(a) ≥ f(x) for x ∈ D, and that f has an absolute minimum at a iff(a) ≤ f(x) for x ∈ D.

In the single-variable case, it is known, by the Extreme Value Theorem,that if f is continuous on a closed interval [a, b], then it has has an absolutemaximum and an absolute minimum on [a, b]. To find them, it is necessaryto check all critical points in [a, b], and the endpoints a and b, as the absolutemaximum and absolute minimum must each occur at one of these points.

The generalization of a closed interval to the multivariable case is thenotion of a compact set. Previously, we defined an open set, and a boundarypoint. A closed set is a set that contains all of its boundary points. Abounded set is a set that is contained entirely within a ball Dr(x0) for somechoice of r and x0. Finally, a set is compact if it is closed and bounded.

We can now state the generalization of the Extreme Value Theorem tothe multivariable case. It states that a continuous function on a compactset has an absolute minimum and an absolute maximum. Therefore, givensuch a compact set D, to find the absolute maximum and minimum, it issufficient to check the critical points of f in D, and to find the extreme(maximum and minimum) values of f on the boundary. The largest of all ofthese values is the absolute maximum value, and the smallest is the absoluteminimum value.

It should be noted that in cases where D has a simple shape, such asa rectangle, triangle or cube, it is possible to check boundary points bycharacterizing them using one or more equations, using these equations toeliminate a variable, and then substituting for the eliminated variable in fto obtain a function of one less variable. Then, it is possible to find extremevalues on the boundary by solving a maximization or minimization problemin one less dimension.

Example Consider the function f(x, y) = x2 + 3y2 − 4x− 6y. We will findthe absolute maximum and minimum values of this function on the trianglewith vertices (0, 0), (4, 0) and (0, 3).

First, we look for critical points. We have

∇f =[

2x− 4 6y − 6].


We see that there is only one critical point, at (x0, y0) = (2, 1). Becausethe triangle includes points that satisfy the inequalities x ≥ 0, y ≥ 0 andy ≤ 3 − 3x/4, and the point (2, 1) satisfies all of these inequalities, weconclude that this point lies within the triangle. It is therefore a candidatefor an absolute maximum or minimum.

We now check the boundary, by examining each edge of the triangleindividually. On the edge between (0, 0) and (0, 3), we have x = 0, whichyields f(0, y) = 3y2 − 6y. We then have fy(0, y) = 6y − 6, which has acritical point at y = 1. Therefore, (0, 1) is also a candidate for an absoluteextremum. Similarly, along the edge between (0, 0) and (4, 0), we have y = 0,which yields f(x, 0) = x2 − 4x. We then have fx(x, 0) = 2x − 4, which hasa critical point at x = 2. Therefore, (2, 0) is a candidate for an absoluteextremum.

We then check the edge between (0, 3) and (4, 0), along which y = 3 −3x/4. Substituting this into f(x, y) yields the function

g(x) = f

(x, 3− 3x

4

)=

43

16x2 + 9− 13x.

To determine the critical points of this function, we solve g′(x) = 0, whichyields x = 104/43. Since y = 3−3x/4 along this edge, the point (104/43, 51/43)is a candidate for an absolute extremum.

Finally, we must include the vertices of the triangle, because they tooare boundary points of the triangle, as well as boundary points of the edgesalong which we attempted to find extrema of single-variable functions. Inall, we have seven candidates: the critical point of f , (2, 1), the three criticalpoints found along the edges, (0, 1), (2, 0) and (104/43, 51/43), and the threevertices, (0, 0), (4, 0) and (0, 3). Evaluating f(x, y) at all of these points, weobtain

x y f(x,y)

2 1 −70 1 −32 0 −4

104/43 51/43 −289/430 0 04 0 00 3 9

We conclude that the absolute minimum is at (2, 1), and the absolute max-imum is at (0, 3). The function is shown on Figure 1.5. 2


Figure 1.5: The function f(x, y) = x2 + 3y2 − 4x− 6y on the triangle withvertices (0, 0), (4, 0) and (0, 3).

Previously, we learned that when seeking a local minimum or maximumof a function of variables, the Second Derivative Test from single-variablecalculus, in which the sign of the second derivative indicated whether alocal extremum was a maximum or minimum, generalizes to the SecondDerivatives Test, which indicates that a local extremum x0 is a minimum ifthe Hessian, the matrix of second partial derivatives, is positive definite atx0.

We will now use Taylor series to explain why this test is effective. Recallthat in single-variable calculus, Taylor’s Theorem states that a function f(x)with at least three continuous derivatives at x0 can be written as

f(x) = f(x0) + f ′(x0)(x− x0) +1

2f ′′(x0)(x− x0)2 +

1

6f ′′′(ξ)(x− x0)3,

where ξ is between x and x0. In the multivariable case, Taylor’s Theoremstates that if f : D ⊆ Rn → R has continuous third partial derivatives at


x0 ∈ D, then

f(x) = f(x0) +∇f(x0) · (x− x0) + (x− x0) ·Hf (x0)(x− x0) +R2(x0,x),

where Hf (x0) is the Hessian, the matrix of second partial derivatives at x0,defined by

Hf (x0) =

∂2f∂x21

(x0) ∂2f∂x1∂x2

(x0) · · · ∂2f∂x1∂xn

(x0)

∂2f∂x2∂x1

(x0) ∂2f∂x22

(x0) · · · ∂2f∂x2∂xn

(x0)

......

∂2f∂xn∂x1

(x0) ∂2f∂xn∂x2

(x0) · · · ∂2f∂x2n

(x0)

,

and R2(x0,x) is the Taylor remainder, which satisfies

limx→x0

R2(x0,x)

‖x− x0‖2= 0.

If we let x0 = (x(0)1 , x

(0)2 , . . . , x

(0)n ), then Taylor’s Theorem can be rewritten

using summations:

f(x) = f(x0) +

n∑i=1

∂f

∂xi(x0)(xi − x(0)

i ) +

n∑i,j=1

∂2f

∂xi∂xj(x0)(xi − x(0)

i )(xj − x(0)j ) +

R2(x0,x).

Example Let f(x, y) = x2y3 + xy4, and let (x0, y0) = (1,−2). Then, frompartial differentiation of f , we obtain its gradient

∇f =[fx fy

]=[

2xy3 + y4 3x2y2 + 4xy3],

and its Hessian,

Hf (x, y) =

[fxx fxyfyx fyy

]=

[2y3 6xy2 + 4y3

6xy2 + 4y3 6x2y + 12xy2

].

Therefore

∇f(1,−2) =[

0 −20], Hf (1,−2) =

[−16 −8−8 36

],


and the Taylor expansion of f around (1,−2) is

f(x, y) = f(x0, y0) +∇f(x0, y0) · 〈x− x0, y − y0〉+

1

2〈x− x0, y − y0〉 ·Hf (x0, y0)〈x− x0, y − y0〉+R2((x0, y0), (x, y))

= 8 +[

0 −20] [ x− 1

y + 2

]+

〈x− 1, y + 2〉 ·[−16 −8−8 36

] [x− 1y + 2

]+

R2((1,−2), (x, y))

= 8− 20(y + 2)− 16(x− 1)2 − 16(x− 1)(y + 2) + 36(y + 2)2 +

R2((1,−2), (x, y)).

The first three terms represent an approximation of f(x, y) by a quadraticfunction that is valid near the point (1,−2). 2

Now, suppose that x0 is a critical point of x. If this point is to bea local minimum, then we must have f(x) ≥ f(x0) for x near x0. Since∇f(x0) = 0, it follows that we must have

(x− x0) · [Hf (x0)(x− x0)] ≥ 0.

However, if the Hessian Hf (x0) is a positive definite matrix, then, by defini-tion, this expression is actually strictly greater than zero. Therefore, we areassured that x0 is a local minimum. In fact, x0 is a strict local minimum,since we can conclude that f(x) > f(x0) for all x sufficiently near x0.

As discussed previously, there are various properties possessed by sym-metric positive definite matrices. One other, which provides a relativelystraightforward method of checking whether a matrix is positive definite, isto check whether the determinants of its principal submatrices, known asprincipal minors, are positive. Given an n × n matrix A, its principal sub-matrices are the submatrices consisting of its first k rows and columns, fork = 1, 2, . . . , n. Note that checking these determinants, the principal minors,is equivalent to the test that we have previously described for determiningwhether a 2× 2 matrix is positive definite.

Example Let f(x, y, z) = x2 + y2 + z2 + xy. To find any local maxima orminima of this function, we compute its gradient, which is

∇f(x, y, z) =[

2x+ y 2y + x 2z].

1.9. CONSTRAINED OPTIMIZATION 61

It follows that the only critical point is at (x0, y0, z0) = (0, 0, 0). To performthe Second Derivatives Test, we compute the Hessian of f , which is

Hf (x, y, z) =

fxx fxy fxzfyx fyy fyzfzx fzy fzz

=

2 1 01 2 00 0 2

.To determine whether this matrix is positive definite, we can compute thedeterminants of the principal submatrices of Hf (0, 0, 0), which are

[Hf (0, 0, 0)]11 = 2,

[Hf (0, 0, 0)]1:2,1:2 =

[2 11 2

],

[Hf (0, 0, 0)]1:3,1:3 =

2 1 01 2 00 0 2

.For the principal minors, we have

det([Hf (0, 0, 0)]11) = 2, det([Hf (0, 0, 0)]1:2,1:2) = 2(2)− 1(1) = 3,

det([Hf (0, 0, 0)]1:3,1:3) = 2 det([Hf (0, 0, 0)]1:2,1:2) = 6.

Since all of the principal minors are positive, we conclude that Hf (0, 0, 0) ispositive definite, and therefore the critical point is a minimum of f . 2

1.9 Constrained Optimization

Now, we consider the problem of finding the maximum or minimum value ofa function f(x), except that the independent variables x = (x1, x2, . . . , xn)are subject to one or more constraints. These constraints prevent us fromusing the standard approach for finding extrema, but the ideas behind thestandard approach are still useful for developing an approach to the con-strained problem.

We assume that the constraints are equations of the form

gi(x) = 0, i = 1, 2, . . . ,m

for given functions gi(x). That is, we may only consider x = (x1, x2, . . . , xn)that belong to the intersection of the hypersurfaces (surfaces, when n = 3,or curves, when n = 2) defined by the gi, when computing a maximumor minimum value of f . For conciseness, we rewrite these constraints as a


vector equation g(x) = 0, where g : Rn → Rm is a vector-valued functionwith component functions gi, for i = 1, 2, . . . ,m.

By Taylor’s theorem, we have, for x0 ∈ Rn at which g is differentiable,

g(x) = g(x0) + Jg(x0)(x− x0) +R1(x0,x),

where Jg(x0) is the Jacobian matrix of g at x0, consisting of the first partialderivatives of the gi evaluated at x0, and R1(x0,x) is the Taylor remainder,which satisfies

limx→x0

R1(x0,x)

‖x− x0‖= 0.

It follows that if u is a vector belonging to all of the tangent spaces of thehypersurfaces defined by the gi, then, because each gi must remain constantas x deviates from x0 in the direction of u, we must have Jg(x0)u = 0. Inother words, ∇gi(x0) · u = 0 for i = 1, 2, . . . ,m.

Now, suppose that x0 is a local minimum of f(x), subject to the con-straints g(x0) = 0. Then, x0 may not necessarily be a critical point of f , butf may not change along any direction from x0 that satisfies the constraints.Therefore, we must have ∇f(x0) ·u = 0 for any vector u in the intersectionof tangent spaces, at x0, of the hypersurfaces defined by the constraints.

It follows that if u is any such vector in this tangent plane, and thereexist constants λ1, λ2, . . . , λm such that

∇f(x0) = λ1∇g1(x0) + λ2∇g2(x0) + · · ·+ λm∇gm(x0),

then the requirement ∇f(x0) · u = 0 follows directly from the fact that∇gi(x0) · u = 0, and therefore x0 must be a constrained critical point of f .The constants λ1, λ2, . . . , λm are called Lagrange multipliers.

Example When m = 1; that is, when there is only one constraint, theproblem of finding a constrained minimum or maximum reduces to findinga point x0 in the domain of f such that

∇f(x0) = λ∇g(x0),

for a single Lagrange multiplier λ.Let f(x, y) = 4x2 + 9y2. The minimum value of this function is at 0,

which is attained at x = y = 0, but we wish to find the minimum of f(x, y)subject to the constraint x2 + y2 − 2x − 2y = 2. That is, we must haveg(x, y) = 0 where g(x, y) = x2 + y2 − 2x − 2y − 2. To find any points thatare candidates for the constrained minimum, we compute the gradients of fand g, which are

∇f =[

8x 18y],


∇g =[

2x− 2 2y − 2].

In order for the equation ∇f(x, y) = λ∇g(x, y) to be satisfied, we musthave, for some choice of λ, x and y,

8x = λ(2x− 2), 18y = λ(2y − 2).

From these equations, we obtain

x =λ

λ− 4, y =

λ

λ− 9.

Substituting these into the constraint x2 + y2 − 2x − 2y − 2 = 0 yields thefourth-degree equation

4λ4 − 104λ3 + 867λ2 − 2808λ+ 2592 = 0.

This equation has two real solutions,

λ1 =3

2, λ2 ≈ 13.6.

Substituting these values into the above equations for x and y yield thecritical points

x1 = −3

5, y1 = −1

5, λ1 =

3

2,

x2 ≈ 1.416626, y2 ≈ 2.956124, λ2 ≈ 13.6.

Substituting the x and y values into f(x, y) yields the minimum value of 9/5at (x1, y1) and the maximum value of approximately 86.675 at (x2, y2). 2

Example Let f(x, y, z) = x + y + z. We wish to find the extremea of thisfunction subject to the constraints x2 + y2 = 1 and 2x+ z = 1. That is, wemust have g1(x, y, z) = g2(x, y, z) = 0, where g1(x, y, z) = x2 + y2 − 1 andg2(x, y, z) = 2x+ z − 1. We must find λ1 and λ2 such that

∇f = λ1∇g1 + λ2∇g2,

or [1 1 1

]= λ1

[2x 2y 0

]+ λ2

[2 0 1

].

This equation, together with the constraints, yields the system of equations

1 = 2xλ1 + 2λ2

1 = 2yλ1

1 = λ2

1 = x2 + y2

1 = 2x+ z.


From the third equation, λ2 = 1, which, by the first equation, yields 2xλ1 =−1. It follows from the second equation that x = −y. This, in conjunc-tion with the fourth equation, yields (x, y) = (1/

√2,−1/

√2) or (x, y) =

(−1/√

2, 1/√

2). From the fifth equation, we obtain the two critical points

(x1, y1, z1) =

(1√2,− 1√

2, 1−

√2

), (x2, y2, y2) =

(− 1√

2,

1√2, 1 +

√2

).

Substituting these points into f yields f(x1, y1, z1) = 1−√

2 and f(x2, y2, z2) =1+√

2, so we conclude that (x1, y1, z1) is a local minimum of f and (x2, y2, z2)is a local maximum of f , subject to the constraints g1(x, y, z) = g2(x, y, z) =0. 2

The method of Lagrange multipliers can be used in conjunction with themethod of finding unconstrained local maxima and minima in order to findthe absolute maximum and minimum of a function on a compact (closedand bounded) set. The basic idea is as follows:

• Find the (unconstrained) critical points of the function, and excludethose that do not belong to the interior of the set.

• Use the method of Lagrange multipliers to find the constrained crit-ical points that lie on the boundary of the set, using equations thatcharacterize the boundary points as constraints. Also, include cornersof the boundary, as they represent critical points due to the function,restricted to the boundary, not being differentiable.

• Evaluate the function at all of the constrained and unconstrained crit-ical points. The largest value is the absolute maximum value on theset, and the smallest value is the absolute minimum value on the set.

From a linear algebra point of view, ∇f(x0) must be orthogonal to anyvector u in the null space of Jg(x0) (that is, the set consisting of any vectorv such that Jg(x0)v = 0), and therefore it must lie in the range of Jg(x0)T ,the transpose of Jg(x0). That is, ∇f(x0) = Jg(x0)Tu for some vector u,meaning that∇f(x0) must be a linear combination of the rows of Jg(x0) (thecolumns of Jg(x0)T ), which are the gradients of the component functions ofg at x0.

Another way to view the method of Lagrange multipliers is as a modifiedunconstrained optimization problem. If we define the function h(x, λ) by

h(x, λ) = f(x)− λ · g(x) = f(x)−m∑i=1

λigi(x),


then we can find constrained extrema of f by finding unconstrained extremaof h, for

∇h(x, λ) =[∇f(x)− λ · Jg(x) −g(x)

].

Because all components of the gradient must be equal to zero at a criti-cal point (when the gradient exists), the constraints must be satisfied at acritical point of h, and ∇f must be a linear combination of the ∇gi, so fis only changing along directions that violate the constraints. Therefore, acritical point is a candidate for a constrained maximum or minimum. Bythe Second Derivatives Test, we can then use the Hessian of h to determineif any constrained extremum is a maximum or minimum.


1.10 Appendix: Linear Algebra Concepts

1.10.1 Matrix Multiplication

As we work with Jacobian matrices for vector-valued functions of severalvariables, matrix multiplication is a highly relevant operation in multivari-able calculus. We have previously defined the product of an m × n matrixA (that is, A has m rows and n columns) and an n × p matrix B as them × p matrix C = AB, where the entry in row i and column j of C is thedot product of row i of A and column j of B. This can be written usingsigma notation as

cij =

n∑k=1

aikbkj , i = 1, 2, . . . ,m, j = 1, 2, . . . , p.

Note that the number of columns in A must equal the number of rows in B,or the product AB is undefined. Furthermore, in general, even if A and Bcan be multiplied in either order (that is, if they are square matrices of thesame size), AB does not necessarily equal BA. In the special case where thematrix B is actually a column vector x with n components (that is, p = 1),it is useful to be able to recognize the summation

yi =n∑j=1

aijxj

as the formula for the ith component of the vector y = Ax.

Example Let A a 3× 2 matrix, and B be a 2× 2 matrix, whose entries aregiven by

A =

1 −2−3 45 −6

, B =

[−7 89 −10

].

Then, because the number of columns in A is equal to the number of rowsin B, the product C = AB is defined, and equal to the 3× 2

C =

1(−7) + (−2)9 1(8) + (−2)(−10)(−3)(−7) + 4(9) (−3)(8) + 4(−10)5(−7) + (−6)9 5(8) + (−6)(−10)

=

−25 2857 −64−89 100

.Because the number of columns in B is not the same as the number of rowsin A, it does not make sense to compute the product BA. 2

1.10. APPENDIX: LINEAR ALGEBRA CONCEPTS 67

In multivariable calculus, matrix multiplication most commonly ariseswhen applying the Chain Rule, because the Jacobian matrix of the compo-sition f ◦ g at point x0 in the domain of g is the product of the Jacobianmatrix of f , evaluated at g(x0), and the Jacobian matrix of g evaluated atx0. It follows that the Chain Rule only makes sense when composing func-tions f and g such that the number of dependent variables of g (that is, thenumber of rows in its Jacobian matrix) equals the number of independentvariables of f (that is, the number of columns in its Jacobian matrix).

Matrix multiplication also arises in Taylor series expansions of multi-variable functions, because if f : D ⊆ Rn → R, then the Taylor expansionof f around x0 ∈ D involves the dot product of ∇f(x0) with the vectorx − x0, which is a multiplication of a 1 × n matrix with an n × 1 matrixto produce a scalar (by convention, the gradient is written as a row vector,while points are written as column vectors). Also, such an expansion in-volves the dot product of x−x0 with the product of the Hessian matrix, thematrix of second partial derivatives at x0, and the vector x− x0. Finally, ifg : U ⊆ Rn → Rm is a vector-valued function of n variables, then the secondterm in its Taylor expansion around x0 ∈ U is the product of the Jacobianmatrix of g at x0 and the vector x− x0.

1.10.2 Eigenvalues

Previously, it was mentioned that the eigenvalues of a matrix that is bothsymmetric, and positive definite, are positive. A scalar λ, which can be realor complex, is an eigenvalue of an n × n matrix A (that is, A has n rowsand n columns) if there exists a nonzero vector x such that

Ax = λx.

That is, matrix-vector multiplication of A and x reduces to a simple scalingof x by λ. The vector x is called an eigenvector of A corresponding to λ.

The eigenvalues of A are roots of the characteristic polynomial det(A−λI), which is a polynomial of degree n in the variable λ. Therefore, an n×nmatrix A has n eigenvalues, which may repeat. Although the eigenvaluesof a matrix may be real or complex, even when the matrix is real, theeigenvalues of a real, symmetric matrix, such as the Hessian of any functionwith continuous second partial derivatives, are real.

For a general matrix A, det(A), the determinant of A, is the productof all of the eigenvalues of A. The trace of A, denoted by tr(A), which isdefined to be the sum of the diagonal entries of A, is also the sum of theeigenvalues of A. It follows that when A is a 2 × 2 symmetric matrix, the


determinant and trace can be used to easily confirm that the eigenvalues ofA are either both positive, both negative, or of opposite signs. This is thebasis for the Second Derivatives Test for functions of two variables.

Example Let A be a symmetric 2× 2 matrix defined by

A =

[4 −6−6 10

].

Then

tr(A) = 4 + 10 = 14, det(A) = 4(10)− (−6)(−6) = 4.

It follows that the product and the sum of A’s two eigenvalues are bothpositive. Because A is symmetric, its eigenvalues are also real. Therefore,they must both also be positive, and we can conclude that A is positivedefinite.

To actually compute the eigenvalues, we can compute its characteristicpolynomial, which is

det(A− λI) = det

([4− λ −6−6 10− λ

])= (4− λ)(10− λ)− (−6)(−6)

= λ2 − 14λ+ 4.

Note that

det(A− λI) = λ2 − tr(A)λ+ det(A),

which is true for 2 × 2 matrices in general. To compute the eigenvalues,we use the quadratic formula to compute the roots of this polynomial, andobtain

λ =14±

√142 − 4(4)(1)

2(1)= 7± 3

√5 ≈ 13.708, 0.292.

If A represented the Hessian of a function f(x, y) at a point (x0, y0), and∇f(x0, y0) = 0, then f would have a local minimum at (x0, y0). 2

1.10.3 The Transpose, Inner Product and Null Space

The dot product of two vectors u and v, denoted by u·v, can also be writtenas uTv, where u and v are both column vectors, and uT is the transpose ofu, which converts u into a row vector. In general, the transpose of a matrixA is the matrix AT whose entries are defined by [AT ]ij = [A]ji. That is, inthe transpose, the sense of rows and columns are reversed. The dot product

1.10. APPENDIX: LINEAR ALGEBRA CONCEPTS 69

is also known as an inner product; the outer product of two column vectorsu and v is uvT , which is a matrix, whereas the inner product is a scalar.

Given an m × n matrix A, the null space of A is the set N (A) of alln-vectors such that if x ∈ N (A), then Ax = 0. If x is such a vector, thenfor any m-vector v, vT (Ax) = vT0 = 0. However, because of two properitesof the transpose, (AT )T = A and (AB)T = BTAT , this inner product canbe rewritten as vTAx = vT (AT )Tx = (ATv)Tx. It follows that any vectorin N (A) is orthogonal to any vector in the range of AT , denoted by R(AT ),which is the set of all n-vectors of the form ATv, where v is an m-vector.This is the basis for the condition ∇f = JTg λ in the method of Lagrangemultipliers when there are multiple constraints.

Example Let

A =

1 −2 41 3 −61 −5 10

.Then

AT =

1 1 1−2 3 −54 −6 10

.The null space of A, N (A), consists of all vectors that are multiples of thevector

v =

021

,as it can be verified by matrix-vector multiplication that Av = 0. Now, if welet w be any vector in R3, and we compute u = ATw, then v ·u = vTu = 0,because

vTu = vTATw = (Av)Tw = 0Tw = 0.

For example, it can be confirmed directly that v is orthogonal to any of thecolumns of AT . 2

Chapter 2

Multiple Integrals

2.1 Double Integrals over Rectangles

In single-variable calculus, the definite integral of a function f(x) over aninterval [a, b] was defined to be∫ b

af(x) dx = lim

n→∞

n∑i=1

f(x∗i )∆x,

where ∆x = (b− a)/n, and, for each i, xi−1 ≤ x∗i ≤ xi, where xi = a+ i∆x.

The purpose of the definite integral is to compute the area of a regionwith a curved boundary, using the formula for the area of a rectangle. Thesummation used to define the integral is the sum of the areas of n rectangles,each with width ∆x, and height f(x∗i ), for i = 1, 2, . . . , n. By taking thelimit as n, the number of rectangles, tends to infinity, we obtain the sum ofthe areas of infinitely many rectangles of infinitely small width. We definethe area of the region bounded by the lines x = a, y = 0, x = b, and thecurve y = f(x), to be this limit, if it exists.

Unfortunately, it is too tedious to compute definite integrals using thisdefinition. However, if we define the function F (x) as the definite integral

F (x) =

∫ x

af(s) ds,

then we have

F ′(x) = limh→0

1

h

[∫ x+h

af(s) ds−

∫ x

af(s) ds

]=

1

h

∫ x+h

xf(s) ds.

71

72 CHAPTER 2. MULTIPLE INTEGRALS

Intuitively, as h→ 0, this expression converges to the area of a rectangle ofwidth h and height f(x), divided by the width, which is simply the height,f(x). That is, F ′(x) = f(x). This leads to the Fundamental Theorem ofCalculus, which states that∫ b

af(x) dx = F (b)− F (a),

where F is an antiderivative of f ; that is, F ′ = f . Therefore, definiteintegrals are typically evaluated by attempting to undo the differentiationprocess to find an antiderivative of the integrand f(x), and then evaluatingthis antiderivative at a and b, the limits of the integral.

Now, let f(x, y) be a function of two variables. We consider the problemof computing the volume of the solid in 3-D space bounded by the surfacez = f(x, y), and the planes x = a, x = b, y = c, y = d, and z = 0, wherea, b, c and d are constants. As before, we divide the interval [a, b] into nsubintervals of width ∆x = (b − a)/n, and we similarly divide the interval[c, d] into m subintervals of width ∆y = (d− c)/m. For convenience, we alsodefine xi = a+ i∆x, and yj = c+ j∆y.

Then, we can approximate the volume V of this solid by the sum of thevolumes of mn boxes. The base of each box is a rectangle with dimensions∆x and ∆y, and the height is given by f(x∗i , y

∗j ), where, for each i and j,

xi−1 ≤ x∗i ≤ xi and yj−1 ≤ y∗j ≤ yj . That is,

V ≈n∑i=1

m∑j=1

f(x∗i , y∗j ) ∆y∆x.

We then obtain the exact volume of this solid by letting the number ofsubintervals, n, tend to infinity. The result is the double integral of f(x, y)over the rectangle R = {(x, y) | a ≤ x ≤ b, c ≤ y ≤ d}, which is also writtenas R = [a, b]× [c, d]. The double integral is defined to be

V =

∫ ∫Rf(x, y) dA = lim

m,n→∞

n∑i=1

m∑j=1

f(x∗i , y∗j ) ∆y∆x,

which is equal to the volume of the given solid. The dA corresponds to thequantity ∆A = ∆x∆y, and emphasizes the fact that the integral is definedto be the limit of the sum of volumes of boxes, each with a base of area ∆A.

To evaluate double integrals of this form, we can proceed as in the single-variable case, by noting that if f(x0, y), a function of y, is integrable on [c, d]

2.1. DOUBLE INTEGRALS OVER RECTANGLES 73

for each x0 ∈ [a, b], then we have∫ ∫Rf(x, y) dA = lim

m,n→∞

n∑i=1

m∑j=1

f(x∗i , y∗j ) ∆y∆x

= limn→∞

n∑i=1

limm→∞

m∑j=1

f(x∗i , y∗j )∆y

∆x

= limn→∞

n∑i=1

[∫ d

cf(x∗i , y) dy

]∆x

=

∫ b

a

∫ d

cf(x, y) dy dx.

Similarly, if f(x, y0), a function of x, is integrable on [a, b] for each y0 ∈ [c, d],we also have ∫ ∫

Rf(x, y) dA =

∫ d

c

∫ b

af(x, y) dy dx.

This result is known as Fubini’s Theorem, which states that a double inte-gral of a function f(x, y) can be evaluated as two iterated single integrals,provided that f is integrable as a function of either variable when the othervariable is held fixed. This is guaranteed if, for instance, f(x, y) is continuouson the entire rectangle R.

That is, we can evaluate a double integral by performing partial inte-gration with respect to either variable, x or y, which entails applying theFundamental Theorem of Calculus to integrate f(x, y) with respect to onlythat variable, while treating the other variable as a constant. The result willbe a function of only the other variable, to which the Fundamental Theoremof Calculus can be applied a second time to complete the evaluation of thedouble integral.

Example Let R = [0, 1] × [0, 2], and let f(x, y) = x2y + xy3. We will useFubini’s Theorem to evaluate∫ ∫

Rf(x, y) dy dx.

We have ∫ ∫Rf(x, y) dy dx =

∫ 1

0

∫ 2

0x2y + xy3 dy dx

=

∫ 1

0

[∫ 2

0x2y + xy3 dy

]dx


=

∫ 1

0

[∫ 2

0x2y dy +

∫ 2

0xy3 dy

]dx

=

∫ 1

0

[x2

∫ 2

0y dy + x

∫ 2

0y3 dy

]dx

=

∫ 1

0

[x2 y

2

2

∣∣∣∣20

+ xy4

4

∣∣∣∣20

]dx

=

∫ 1

02x2 + 4x dx

=

(2x3

3+ 2x2

)∣∣∣∣10

=8

3.

2

In view of Fubini’s Theorem, a double integral is often written as∫ ∫Rf(x, y) dA =

∫ ∫Rf(x, y) dy dx =

∫ ∫Rf(x, y) dx dy.

Example We wish to compute the volume V of the solid bounded by theplanes x = 1, x = 4, y = 0, y = 2, z = 0, and x + y + z = 8. Theplane that defines the top of this solid is also the graph of the functionz = f(x, y) = 8 − x − y. It follows that the volume of the solid is given bythe double integral

V =

∫ ∫R

8− x− y dA, R = [1, 4]× [0, 2].

Using Fubini’s Theorem, we obtain

V =

∫ ∫R

8− x− y dA

=

∫ 4

1

[∫ 2

08− x− y dy

]dx

=

∫ 4

1

(8y − xy − y2

2

)∣∣∣∣20

dx

=

∫ 4

114− 2x dx

= (14x− x2)∣∣41

2.2. DOUBLE INTEGRALS OVER MORE GENERAL REGIONS 75

= (56− 16)− (14− 1)

= 27.

2

We conclude by noting some useful properties of the double integral, thatare direct generalizations of corresponding properties for single integrals:

• Linearity: If f(x, y) and g(x, y) are both integrable over R, then∫ ∫R

[f(x, y) + g(x, y)] dA =

∫ ∫Rf(x, y) dA+

∫ ∫Rg(x, y) dA

• Homogeneity: If c is a constant, then∫ ∫Rcf(x, y) dA = c

∫ ∫Rf(x, y) dA

• Monotonicity: If f(x, y) ≥ 0 on R, then∫ ∫Rf(x, y) dA ≥ 0.

• Additivity: If R1 and R2 are disjoint rectangles and Q = R1 ∪R2 is arectangle, then∫ ∫

Qf(x, y) dA =

∫ ∫R1

f(x, y) dA+

∫ ∫R2

f(x, y) dA.

2.2 Double Integrals over More General Regions

We have learned how to integrate a function f(x, y) of two variables over arectangle R. However, it is important to be able to integrate such functionsover more general regions, in order to be able to compute the volume of awider variety of solids.

To that end, given a region D ⊂ R2, contained within a rectangle R, wedefine the double integral of f(x, y) over D by∫ ∫

Df(x, y) dA =

∫ ∫RF (x, y) dA

where

F (x, y) =

{f(x, y) (x, y) ∈ D0 (x, y) ∈ R, /∈ D .


It is possible to use Fubini’s Theorem to compute integrals over certaintypes of general regions. We say that a region D is of type I if it lies betweenthe graphs of two continuous functions of x, and is also bounded by twovertical lines. Specifically,

D = {(x, y) | a ≤ x ≤ b, g1(x) ≤ y ≤ g2(x)}.

To integrate f(x, y) over such a region, we can apply Fubini’s Theorem. Welet R = [a, b]× [c, d] be a rectangle that contains D. Then we have∫ ∫

Df(x, y) dA =

∫ ∫RF (x, y) dA

=

∫ b

a

∫ d

cF (x, y) dy dx

=

∫ b

a

∫ g2(x)

g1(x)F (x, y) dy dx

=

∫ b

a

∫ g2(x)

g1(x)f(x, y) dy dx.

This is valid because F (x, y) = 0 when y < g1(x) or y > g2(x), becausein these cases, (x, y) lies outside of D. The resulting iterated integral canbe evaluated in the same way as iterated integrals over rectangles; the onlydifference is that when the limits of the inner integral are substituted for yin the antiderivative of f(x, y) with respect to y, the limits are functions ofx, rather than constants.

A similar approach can be applied to a region of type II, which is boundedon the left and right by continuous functions of y, and bounded above andbelow by vertical lines. Specifically, D is a region of type II if

D = {(x, y)|h1(y) ≤ x ≤ h2(y), c ≤ y ≤ d}.

Using Fubini’s Theorem, we obtain∫ ∫Df(x, y) dA =

∫ d

c

∫ h2(y)

h1(y)f(x, y) dx dy.

Example We wish to compute the volume of the solid under the planex + y + z = 8, and bounded by the surfaces y = x and y = x2. Thesesurfaces intersect along the lines x = 0, y = 0 and x = 1, y = 1. It followsthat the volume V of the solid is given by the double integral∫ 1

0

∫ x

x28− x− y dy dx.


Note that g2(x) = x is the upper limit of integration, because x2 ≤ x when0 ≤ x ≤ 1. We have

V =

∫ 1

0

∫ x

x28− x− y dy dx

=

∫ 1

0

(8y − xy − y2

2

)∣∣∣∣xx2dx

=

∫ 1

0

(8x− x2 − x2

2

)−(

8x2 − x3 − x4

2

)dx

=

∫ 1

0

x4

2+ x3 − 19x2

2+ 8x dx

=

(x5

10+x4

4− 19x3

6+ 4x2

)∣∣∣∣10

=1

10+

1

4− 19

6+ 4

=71

60.

2

Note that it is sometimes necessary to determine the intersections of surfacesthat define a solid, in order to obtain the limits of integration.

To compute the volume of a solid that is bounded above and below(along the z-direction) by two different surfaces, we can add the volume ofthe solid bounded by the top surface and the plane z = 0 to the volume ofthe solid bounded above by z = 0 and below by the lower surface, whichis equivalent to subtracting the volume of the solid bounded above by thelower surface and below by z = 0.

Example We will compute the volume V of the solid in the first octantbounded by the planes z = 10 + x+ y, z = 2− x− y, and x = 0, as well asthe surfaces y = sinx and y = cosx. As these surfaces intersect along theline y =

√2/2, x = π/4, this volume is given by the double integral

V =

∫ π/4

0

∫ cosx

sinx(10 + x+ y)− (2− x− y) dy dx

=

∫ π/4

0

∫ cosx

sinx8 + 2x+ 2y dy dx

=

∫ π/4

0

(8y + 2xy + y2

)∣∣cosx

sinxdx


=

∫ π/4

0(2x+ 8)(cosx− sinx) + cos2 x− sin2 x dx

=

∫ π/4

0(2x+ 8)(cosx− sinx) + cos 2x dx

=

(2x sinx+ 2x cosx+ 6 sinx+ 10 cosx+

1

2sin 2x

)∣∣∣∣π/40

=π√

2

2+ 8√

2− 19

2.

The final anti-differentiation requires integration by parts,∫u dv = uv −

∫v du,

with u = x and dv = (cosx− sinx) dx. The function z = 10 + x+ y is the“top” plane because for 0 ≤ x ≤ π/4, sinx ≤ y ≤ cosx, 10+x+y ≥ 2−x−y.2

By setting the integrand f(x, y) ≡ 1 on a region D, and integrating overD, we can obtain A(D), the area of D.

Example We will compute the area of a half-circle by integrating f(x, y) ≡ 1over a region D that is bounded by the planes z = 0, z = 1, and y = 0, andthe surface y =

√1− x2. This surface intersects the plane y = 0 along the

lines y = 0, x = 1 and y = 0, x = −1. Therefore the area is given by

A(D) =

∫ 1

−1

∫ √1−x2

01 dy dx =

∫ 1

−1y|√

1−x20 dx =

∫ 1

−1

√1− x2 dx.

To evaluate this integral, we use the trigonometric substitution x = sin θ,for which dx = cos θ dθ, which yields

A(D) =

∫ π/2

−π/2cos2 θ dθ =

∫ π/2

−π/2

1 + cos 2θ

2dθ =

(θ

2+

sin 2θ

4

)∣∣∣∣π/2−π/2

=π

2.

2

2.2.1 Changing the Order of Integration

In some cases, a region can be classified as being of either type I or type II,and therefore a function can be integrated over the region in two differentways. However, one approach or the other may be impractical, due to the


complexity, or even impossibility, of carrying out the anti-differentiation.Therefore, it is important to be able to change the order of integration ifnecessary.

Example Consider the double integral∫ ∫Dey

3dA

where D = {(x, y) | 0 ≤ x ≤ 1,√x ≤ y ≤ 1}. This region is defined as a

region of type I, so it is natural to attempt to evaluate the iterated integral∫ 1

0

∫ 1

√xey

3dy dx.

Unfortunately, it is impossible to anti-differentiate ey3

with respect to y.However, the region D is also a region of type II, as it can be redefined as

D = {(x, y) | 0 ≤ y ≤ 1, 0 ≤ x ≤ y2}.

We then have ∫ ∫Dey

3dA =

∫ 1

0

∫ y2

0ey

3dx dy

=

∫ 1

0xey

3∣∣∣y20dy

=

∫ 1

0y2ey

3dy

=1

3

∫ 1

0eu du, u = y3,

=1

3eu∣∣∣∣10

=1

3(e− 1).

It should be noted that usually, when changing the order of integration, it isnecessary to use the inverse functions of the functions that define the curvedportions of the boundary, in order to obtain the limits of the integration ofthe new inner integral.


2.2.2 The Mean Value Theorem for Integrals

It is important to note that all of the properties of double integrals that havebeen previously discussed, including linearity, homogeneity, monotonicity,and additivity, apply to double integrals over non-rectangular regions aswell. One additional property, that is a consequence of monotonicity, is thatif f(x, y) ≥ m on a region D, and f(x, y) ≤M on D, then

mA(D) ≤∫ ∫

Df(x, y) dA ≤MA(D),

where, as before, A(D) is the area of D. Furthermore, if f is continuous onD, then, by the Mean Value Theorem for Double Integrals, we have∫ ∫

Df(x, y) dA = f(x0, y0)A(D),

where (x0, y0) is some point in D. This is a generalization of the Mean ValueTheorem for Integrals, which is closely related to the Mean Value Theoremfor derivatives.

Example Consider the double integral∫ ∫Dey dA

where D is the triangle defined by D = {(x, y) | 0 ≤ x ≤ 1, 0 ≤ y ≤ 4x}.The area of this triangle is given by A(D) = 1

2bh, where b, the base, is 1and h, the height, is 4, which yields A(D) = 2. Because 1 ≤ ey ≤ e4 when0 ≤ y ≤ 4, it follows that

2 ≤∫ ∫

Dey dA ≤ 2e4 ≈ 109.2.

The exact value is 14(e4 − 5) ≈ 12.4, which is between the above lower and

upper bounds. 2

2.3 Double Integrals in Polar Coordinates

We have learned how to integrate functions of two variables, x and y, overvarious regions that have a simple form. The variables x and y correspondto Cartesian coordinates that are normally used to describe points in 2-D space. However, a region that may not be of type I or type II, when

2.3. DOUBLE INTEGRALS IN POLAR COORDINATES 81

described using Cartesian coordinates, may be of one of these types if it isinstead described using polar coordinates r and θ.

We recall that polar coordinates are related to Cartesian coordinates bythe equations

x = r cos θ, y = r sin θ,

or, alternatively,

r2 = x2 + y2, tan θ =y

x.

In order to integrate a function over a region defined using polar coordinates,we must derive the double integral in these coordinates, as was previouslydone in Cartesian coordinates.

Let a solid be bounded by the surface z = f(r, θ), as well as the surfacesr = a, r = b, θ = α and θ = β, which define a polar rectangle. To computethe volume of this solid, we can approximate it by several solids for whichthe volume can easily be computed. This is accomplished by dividing thepolar rectangle into several smaller polar rectangles of dimensions ∆r and∆θ. The height of each solid is obtained from the value of the function at apoint in the polar rectangle.

Specifically, we divide the interval [a, b] into n subintervals of width ∆r =(b− a)/n. Each subinterval is of the form [ri−1, ri], where ri = a+ i∆r, fori = 1, 2, . . . , n. Similarly, [α, β] is divided into m subintervals of width ∆θ =(β−α)/m, and each subinterval is of the form [θj−1, θj ], where θj = α+j∆θ.Then, the volume V of the solid is approximated by

V ≈n∑i=1

m∑j=1

1

2f(r∗i , θ

∗j )(r

2i − r2

i−1)∆θ,

where, for each i, ri−1 ≤ r∗i ≤ ri, and for each j, θj−1 ≤ θ∗j ≤ θj .The quantity 1

2∆r2∆θ is the area of a polar rectangle, for it is not trulya rectangle, but rather the difference between two circular sectors with angle∆θ and radii ri−1 and ri. However, from

1

2(r2i − r2

i−1) =1

2(ri−1 + ri)(ri − ri−1) =

1

2(ri−1 + ri)∆r,

we see that as m,n → ∞, this approximation of the volume converges tothe exact volume, which is given by the double integral

V =

∫ β

α

∫ b

af(r, θ) r dr dθ.


Note the extra factor of r in the integrand, which is the limit as n→∞ of(ri−1 + ri)/2.

If the base of the solid can be represented by a polar region of type I,

D = {(r, θ) |α ≤ θ ≤ β, h1(θ) ≤ r ≤ h2(θ)},

then the volume V of the solid defined by the surface z = f(r, θ) and thesurfaces that define D is given by the iterated integral

V =

∫ β

α

∫ h2(θ)

h1(θ)f(r, θ) r dr dθ.

As before, if f(r, θ) ≡ 1, then this integral yields A(D), the area of D.

Example To evaluate the double integral∫ ∫Dx+ y dA,

where D = {(x, y) | 1 ≤ x2 + y2 ≤ 4, x ≤ 0}, we convert the integrand, andthe description of D, to polar coordinates. We then have∫ ∫

Dr cos θ + r sin θ dA

where D = {(r, θ) | 1 ≤ r ≤ 2, π/2 ≤ θ ≤ 3π/2}. This simplifies the integralconsiderably, because D can be described as a polar rectangle. We thenhave ∫ ∫

Dx+ y dA =

∫ 3π/2

π/2

∫ 2

1(r cos θ + r sin θ)r dr dθ

=

∫ 3π/2

π/2

∫ 2

1r2(cos θ + sin θ) dr dθ

=

∫ 3π/2

π/2(cos θ + sin θ)

r3

3

∣∣∣∣21

dθ

=7

3

∫ 3π/2

π/2(cos θ + sin θ) dθ

=7

3(sin θ − cos θ)|3π/2π/2

=7

3[(−1− 0)− (1− 0)]

= −14

3.

2.3. DOUBLE INTEGRALS IN POLAR COORDINATES 83

2

Example To compute the volume of the solid in the first octant boundedbelow by the cone z =

√x2 + y2, and above by the sphere x2 + y2 + z2 = 8,

as well as the planes y = x and y = 0, we first rewrite the equations of thebounding surfaces in polar coordinates. The solid is bounded below by thecone z = r, above by the sphere r2 + z2 = 8, and the surfaces θ = 0 andθ = π/4, since the solid lies in the first octant. The surfaces that bound thesolid above and below intersect when 2r2 = 8, or r = 2. It follows that thevolume is given by

V =

∫ π/4

0

∫ 2

0[√

8− r2 − r]r dr dθ

=

∫ π/4

0

∫ 2

0r√

8− r2 dr dθ −∫ π/4

0

∫ 2

0r2 dr dθ

= −1

2

∫ π/4

0

∫ 4

8u1/2 du dθ −

∫ π/4

0

r3

3

∣∣∣∣20

dθ

=1

2

∫ π/4

0

2

3u3/2

∣∣∣∣84

dθ −∫ π/4

0

8

3dθ

=1

3

∫ π/4

0[16√

2− 8] dθ − 2π

3

=4π

3[√

2− 1].

In the third step, the substitution u = 8 − r2 is used. Then, the limits ofintegration are interchanged in order to reverse the sign of the integral. 2

Example The double integral∫ 1

−1

∫ √1−x2

0f(x, y) dy dx

can be converted to polar coordinates by converting the equation that de-scribes the top boundary of the domain of integration, y =

√1− x2, into a

polar equation. We substitute x = r cos θ and y = r sin θ into this equationto obtain

r sin θ =√

1− cos2 θ.

Squaring both sides yields r2 sin2 θ = 1− cos2 θ, and, in view of the identitycos2 θ+ sin2 θ = 1, we obtain the polar equation r = 1. Because the bottom


boundary, y = 0, corresponds to the rays θ = 0 and θ = π, the integral canbe expressed in polar coordinates as∫ π

0

∫ 1

0f(r cos θ, r sin θ) r dr dθ.

2

Example We evaluate the double integral∫ 2

0

∫ √2x−x2

0

√x2 + y2 dy dx

by converting to polar coordinates By completing the square, we obtain2x− x2 = 1− (x− 1)2. It follows that the region D over which the integralis to be evaluated,

D = {(x, y) | 0 ≤ x ≤ 2, 0 ≤ y ≤√

2x− x2},

has its top boundary defined by the equation y =√

2x− x2, or

(x− 1)2 + y2 = 1.

That is, the top boundary is the upper half of the circle with radius 1 andcenter (1, 0). In polar coordinates, the equation of the top boundary becomes

(r cos θ − 1)2 + r2 sin2 θ = 1,

or, upon expanding and simplifying,

r = 2 cos θ.

The region D is contained between the rays θ = 0 and θ = π/2. It followsthat in polar coordinates, D is defined by

D = {(r, θ) | 0 ≤ θ ≤ π/2, 0 ≤ r ≤ 2 cos θ}.

The lower limit r = 0 is obtained from the fact that D contains the origin.We thus obtain the integral∫ 2

0

∫ √2x−x2

0

√x2 + y2 dy dx =

∫ π/2

0

∫ 2 cos θ

0r2 dr dθ.

The integrand of the original integral is r, but the additional factor of rrequired by the change to polar coordinates yields an integrand of r2.

2.4. TRIPLE INTEGRALS 85

Evaluating this integral, we obtain∫ 2

0

∫ √2x−x2

0

√x2 + y2 dy dx =

∫ π/2

0

∫ 2 cos θ

0r2 dr dθ

=

∫ π/2

0

r3

3

∣∣∣∣2 cos θ

0

dθ

=8

3

∫ π/2

0cos3 θ dθ

=8

3

∫ π/2

0cos2 θ cos θ dθ

=8

3

∫ π/2

0(1− sin2 θ) cos θ dθ

=8

3

∫ π/2

0cos θ dθ − 8

3

∫ π/2

0sin2 θ cos θ dθ

=8

3sin θ|π/20 − 8

3

∫ 1

0u2 du

=8

3(1)− 8

3

u3

3

∣∣∣∣10

=16

9.

2

2.4 Triple Integrals

The integral of a function of three variables over a region D ⊂ R3 can bedefined in a similar way as the double integral. Let D be the box defined by

D = {(x, y, z) | a ≤ x ≤ b, c ≤ y ≤ d, r ≤ z ≤ s}.

Then, as with the double integral, we divide [a, b] into n subintervals of width∆x = (b − a)/n, with endpoints [xi−1, xi], for i = 1, 2, . . . , n. Similarly, wedivide [c, d] into m subintervals of width ∆y = (d − c)/m, with endpoints[yj−1, yj ], for j = 1, 2, . . . ,m, and divide [r, s] into ` subintervals of width∆z = (s− r)/`, with endpoints [zk−1, zk] for k = 1, 2, . . . , `.

Then, we can define the triple integral of a function f(x, y, z) over D by∫ ∫ ∫Df(x, y, z) dV = lim

m,n,`→∞

n∑i=1

m∑j=1

∑k=1

f(x∗i , y∗j , z∗k) ∆V,


where ∆V = ∆x∆y∆z. As with double integrals, the practical method ofevaluating a triple integral is as an iterated integral, such as∫ ∫ ∫

Df(x, y, z) dV =

∫ s

r

∫ d

c

∫ b

af(x, y, z) dx dy dz.

By Fubini’s Theorem, which generalizes to three dimensions or more, theorder of integration can be rearranged when f is continuous on D.

A triple integral over a more general region can be defined in the sameway as with double integrals. If E is a bounded subset of R3, that is con-tained within a box B, then we can define∫ ∫ ∫

Ef(x, y, z) dV =

∫ ∫ ∫BF (x, y, z) dV,

where

F (x, y, z) =

{f(x, y, z) (x, y, z) ∈ E,0 (x, y, z) /∈ E .

All of the properties previously associated with the double integral, such aslinearity and additivity, generalize to the triple integral as well.

Just as regions were classified as type I or type II for double integrals,they can be classified for the purpose of setting up triple integrals. A solidregion E is said to be of type 1 if it lies between the graphs of two contin-uous functions of x and y that are defined on a two-dimensional region D.Specifically,

E = {(x, y, z) | (x, y) ∈ D,u1(x, y) ≤ z ≤ u2(x, y)}.

Then, an integral of a function f(x, y, z) over E can be evaluated as∫ ∫ ∫Ef(x, y, z) dV =

∫ ∫D

∫ u2(x,y)

u1(x,y)f(x, y, z) dz dA,

where the double integral over D can be evaluated in a manner that isappropriate for the type of D.

For example, if D is of type I, then

E = {(x, y, z) | a ≤ x ≤ b, g1(x) ≤ y ≤ g2(x), u1(x, y) ≤ z ≤ u2(x, y)},

and therefore∫ ∫ ∫Ef(x, y, z) dV =

∫ b

a

∫ g2(x)

g1(x)

∫ u2(x,y)

u1(x,y)f(x, y, z) dz dy dx.


On the other hand, if E is of type 2, then it has a definition of the form

E = {(x, y, z) | (y, z) ∈ D,u1(y, z) ≤ x ≤ u2(y, z)}.

That is, E lies between the graphs of two continuous functions of y and zthat are defined on a two-dimensional region D. Finally, if E is a region oftype 3, then it lies between the graphs of two continuous functions of x andz. That is,

E = {(x, y, z) | (x, z) ∈ D,u1(y, z) ≤ y ≤ u2(y, z)}.

If more than one type applies to a given region E, then the order of evalu-ation can be determined by which ordering leads to the integrands that aremost easily anti-differentiated within each single integral that arises.

Example Let E be a solid tetrahedron bounded by the planes x = 0, y = 0,z = 0 and x+ y + z = 1. We wish to integrate the function f(x, y, z) = xzover this tetrahedron. From the given bounding planes, we see that thetetrahedron is bounded below by the plane z = 0 and above by the planez = 1−x−y. Therefore, we surmise that E can be viewed as a solid of type1. This requires finding a region D in the xy-plane such that E is boundedby z = 0 and z = 1− x− y on D.

We first note that these planes intersect along the line x + y = 1. Itfollows that the base of E is a 2-D region D that can be described by theinequalities x ≥ 0, y ≥ 0, and x+ y ≤ 1. This region is of type I or type II,so we choose type I and obtain the description

D = {(x, y) | 0 ≤ x ≤ 1, 0 ≤ y ≤ 1− x}.

Therefore, we can integrate f(x, y, z) over E as follows:∫ ∫ ∫Exz dV =

∫ 1

0

∫ 1−x

0

∫ 1−x−y

0xz dz dy dx

=

∫ 1

0x

∫ 1−x

0

∫ 1−x−y

0z dz dy dz

=

∫ 1

0x

∫ 1−x

0

z2

2

∣∣∣∣1−x−y0

dy dx

=1

2

∫ 1

0x

∫ 1−x

0(1− x− y)2 dy dx

=1

2

∫ 1

0x

(−(1− x− y)3

3

)∣∣∣∣1−x0

dx


=1

6

∫ 1

0x(1− x)3 dx

=1

6

∫ 1

0(1− u)u3 du, u = 1− x

=1

6

∫ 1

0u3 − u4 du

=1

6

(u4

4− u5

5

)∣∣∣∣10

=1

6

(1

4− 1

5

)=

1

120.

2

Example We will compute the volume of the solid E bounded by the sur-faces y = x, y = x2, z = x, and z = 0. Because E is bounded by twosurfaces that define z as a function of x and y, we view E as a solid of type1. It is bounded by the graphs of the functions z = 0 and z = x that aredefined on a region D in the xy-plane. This region is bounded by the curvesy = x and y = x2. Because these curves intersect when x = 0 and x = 1,we can describe D as a region of type I:

D = {(x, y) | 0 ≤ x ≤ 1, x2 ≤ y ≤ x}.It follows that the volume of E is given by the iterated integral∫ ∫ ∫

E1 dV =

∫ 1

0

∫ x

x2

∫ x

01 dz dy dx

=

∫ 1

0

∫ x

x2x dy dx

=

∫ 1

0x

∫ x

x21 dy dx

=

∫ 1

0x(x− x2) dx

=

∫ 1

0x2 − x3 dx

=

(x3

3− x4

4

)∣∣∣∣10

=1

12.


2

Example We evaluate the triple integral∫ ∫ ∫Ex dV

where E is the solid bounded by the paraboloid x = 4y2 + 4z2 and theplane x = 4. The paraboloid and the plane intersect when y2 + z2 = 1. Itfollows that the right boundary of the solid E is the unit disk y2 + z2 ≤ 1,contained within the plane x = 4. Because the paraboloid serves as the“left” boundary of E, we can define E by the inequalities

E = {(x, y, z) | y2 + z2 ≤ 1, 4(y2 + z2) ≤ x ≤ 4}.

Therefore, the triple integral can be written as an iterated integral∫ ∫ ∫Ex dV =

∫ ∫D

[∫ 4

4(y2+z2)x dx

]dA,

where D is the unit disk in the yz-plane, y2 + z2 = 1. If we convert y and zto polar coordinates y = r cos θ and z = r sin θ, we can rewrite this integralas ∫ ∫ ∫

Ex dV =

∫ 2π

0

∫ 1

0

∫ 4

4r2xr dx dr dθ.

Evaluating this integral, we obtain∫ ∫ ∫Ex dV =

∫ 2π

0

∫ 1

0

∫ 4

4r2xr dx dr dθ

=

∫ 2π

0

∫ 1

0rx2

2

∣∣∣∣44r2

dr dθ

= 8

∫ 2π

0

∫ 1

0r(1− r4) dr dθ

= 8

∫ 2π

0

∫ 1

0r − r5 dr dθ

= 8

∫ 2π

0

r2

2− r6

6

∣∣∣∣10

dθ

= 8

∫ 2π

0

1

3dθ

=16π

3.

2


2.5 Applications of Double and Triple Integrals

We now explore various applications of double and triple integrals arisingfrom physics. When an object has constant density ρ, then it is known thatits mass m is equal to ρV , where V is its volume. Now, suppose that aflat plate, also known as a lamina, has a non-uniform density ρ(x, y), for(x, y) ∈ D, where D defines the shape of the lamina. Then, its mass is givenby

m =

∫ ∫Dρ(x, y) dA.

Similarly, if E is a solid region in 3-D space, and ρ(x, y, z) is the density ofthe solid at the point (x, y, z) ∈ E, then the mass of the solid is given by

m =

∫ ∫ ∫Eρ(x, y, z) dV.

We see that just as the integral allows simple “product” formulas for area andvolume to be applied to more general problems, it allows similar formulasfor quantities such as mass to be generalized as well.

The center of mass, also known as the center of gravity, of an object isthe point at which the object behaves as if its entire mass is concentratedat that point. If the object is one- or two-dimensional, the center of mass isthe point at which the object can be balanced horizontally (like a see-sawwith riders at either end, in the one-dimensional case).

For a lamina with its shape defined by a bounded region D ⊂ R2, andwith density given byρ(x, y), its center of mass (x, y) is located at

x =My

m, y =

Mx

m,

where Mx and My are the moments of the lamina about the x-axis andy-axis, respectively. These are given by

Mx =

∫ ∫Dyρ(x, y) dA, My =

∫ ∫Dxρ(x, y) dA.

These integrals are obtained from the formula for the moment of a pointmass about an axis, which is given by the product of the mass and thedistance from the axis.

Similarly, the moments about the xy-, yz- and xz-planes, Mxy, Myz, andMxz, of a solid E ⊂ R3 with density ρ(x, y, z) are given by

Mxy =

∫ ∫ ∫Ezρ(x, y, z) dV,

2.6. TRIPLE INTEGRALS IN CYLINDRICAL COORDINATES 91

Myz =

∫ ∫ ∫Exρ(x, y, z) dV,

Mxz =

∫ ∫ ∫Eyρ(x, y, z) dV.

It follows that its center of mass (x, y, z) is located at

x =Myz

m, y =

Mxz

m, z =

Mxy

m.

As in the 2-D case, each moment is defined using the distance of each pointof E from the coordinate plane about which the moment is being computed.

The moment of interia, or second moment, of an object about an axisgives an indication of the object’s tendency to rotate about that axis. Fora lamina defined by a region D ⊂ R2 with density function ρ(x, y), itsmoments of inertia about the x-axis and y-axis, Ix and Iy respectively, aregiven by

Ix =

∫ ∫Dy2ρ(x, y) dA, Iy =

∫ ∫Dx2ρ(x, y) dA.

On the other hand, for a solid defined by a region E ⊂ R3 with densityρ(x, y, z), its moments of inertia about the coordinate axes are defined by

Ix =

∫ ∫ ∫E

(y2 + z2)ρ(x, y, z) dV, Iy =

∫ ∫ ∫E

(x2 + z2)ρ(x, y, z) dV,

Iz =

∫ ∫ ∫E

(x2 + y2)ρ(x, y, z) dV.

The moment Iz is also called the polar moment of interia, or the moment ofinteria about the origin, when E reduces to a lamina with density ρ(x, y).

2.6 Triple Integrals in Cylindrical Coordinates

We have seen that in some cases, it is convenient to evaluate double integralsby converting Cartesian coordinates (x, y) to polar coordinates (r, θ). Thesame is true of triple integrals. When this is the case, Cartesian coordinates(x, y, z) are converted to cylindrical coordinates (r, θ, z).

The relationships between (x, y) and (r, θ) are exactly the same as inpolar coordinates, and the z coordinate is unchanged.


Example The point (x, y, z) = (−3, 3, 4) can be converted to cylindricalcoordinates (r, θ, z) using the relationships from polar coordinates,

r =√x2 + y2, tan θ =

y

x.

These relationships yield

r =√

32 + (−3)2 =√

18 = 3√

2, tan θ = −1.

Since x = −3 < 0, we have θ = tan−1(−1) + π = 3π/4. We conclude thatthe cylindrical coordinates of the point (−3, 3, 4) are (3

√2, 3π/4, 4). 2

Furthermore, just as conversion to polar coordinates in double integralsintroduces a factor of r in the integrand, conversion to cylindrical coordinatesin triple integrals also introduces a factor of r.

Example We evaluate the triple integral∫ ∫ ∫Ef(x, y, z) dV,

where E is the solid bounded below by the paraboloid z = x2 + y2, aboveby the plane z = 4, and the planes y = 0 and y = 2. This integral can beevaluated as an iterated integral∫ 2

−2

∫ √4−x2

0

∫ 4

x2+y2f(x, y, z) dz dy dx,

but if we instead describe the region using cylindrical coordinates, we findthat the solid is bounded below by the paraboloid z = r2, above by theplane z = 4, and contained within the polar “box” 0 ≤ r ≤ 2, 0 ≤ θ ≤ π.We can therefore evaluate the iterated integral∫ 2

0

∫ π

0

∫ 4

r2f(r cos θ, r sin θ, z) r dz dθ dr,

that has much simpler limits. 2

Example We use cylindrical coordinates to evaluate the triple integral∫ ∫ ∫Ex dV

where E is the solid bounded by the planes z = 0 and z = x+y+5, and thecylindrical shells x2 + y2 = 4 and x2 + y2 = 9. In cylindrical coordinates, E

2.7. TRIPLE INTEGRALS IN SPHERICAL COORDINATES 93

is bounded by the planes z = 0 and z = r(cos θ+sin θ)+5, and the cylindersr = 2 and r = 3. It follows that the integral can be written as the iteratedintegral ∫ ∫ ∫

Ex dV =

∫ 2π

0

∫ 3

2

∫ r(cos θ+sin θ)+5

0(r cos θ)r dz dr dθ.

Evaluating this integral, we obtain∫ ∫ ∫Ex dV =

∫ 2π

0cos θ

∫ 3

2r2

∫ r(cos θ+sin θ)+5

0dz dr dθ

=

∫ 2π

0cos θ

∫ 3

2[r3(cos θ + sin θ) + 5r2] dr dθ

=

∫ 2π

0cos θ(cos θ + sin θ)

∫ 3

2r3 dr dθ +

∫ 2π

0cos θ

∫ 3

25r2 dr dθ

=

∫ 2π

0cos θ(cos θ + sin θ)

r4

4

∣∣∣∣32

dθ + 5

∫ 2π

0cos θ

r3

3

∣∣∣∣32

dθ

=65

4

∫ 2π

0cos2 θ + sin θ cos θ dθ +

95

3

∫ 2π

0cos θ dθ

=65

4

∫ 2π

0

1

2(1 + cos 2θ) +

1

2sin 2θ dθ +

95

3sin θ|2π0

=65

4

[1

2θ +

1

2sin 2θ)− 1

4cos 2θ

]∣∣∣∣2π0

dθ

=65π

4.

2

2.7 Triple Integrals in Spherical Coordinates

Another approach to evaluating triple integrals, that is especially usefulwhen integrating over regions that are at least partially defined using spheres,is to use spherical coordinates. Consider a point (x, y, z) that lies on a sphereof radius ρ. Then we know that x2 + y2 + z2 = ρ2. Furthermore, the points(0, 0, 0), (0, 0, z) and (x, y, z) form a right triangle with hypotenuse ρ andlegs |z| and

√ρ2 − z2.

If we denote by φ the angle adjacent to the leg of length |z|, then φ can beinterpreted as an angle of inclination of the point (x, y, z). The angle φ = 0corresonds to the “north pole” of the sphere, while φ = π/2 corresponds to


the “equator”, and φ = π corresponds to the “south pole”. By right triangletrigonometry, we have

z = ρ cosφ.

It follows that x2 + y2 = ρ2 sin2 φ. If we define the angle θ to have the samemeaning as in polar coordinates, then we have

x = ρ sinφ cos θ, y = ρ sinφ sin θ.

We define the spherical coordinates of (x, y, z) to be (ρ, θ, φ).

Example To convert the point (x, y, z) = (1,√

3,−4) to spherical coordi-nates, we first compute

ρ =√x2 + y2 + z2 =

√12 + (

√3)2 + (−4)2 =

√20 = 2

√5.

Next, we use the relation tan θ = y/x, and the fact that x = 1 > 0, to obtain

θ = tan−1 y

x= tan−1

√3 =

π

3.

Finally, to obtain φ, we use the relation z = ρ cosφ, which yields

φ = cos−1 z

ρ= cos−1

(− 4

2√

5

)≈ 2.6779 radians.

2

To evaluate integrals in spherical coordinates, it is important to notethat the volume of a “spherical box” of dimensions ∆r, ∆θ and ∆φ, as∆ρ,∆θ,∆φ→ 0, converges to the infinitesimal

ρ2 sinφdr dθ dφ,

where (ρ, θ, φ) denotes the location of the box in the limit. Therefore, theintegral of a function f(x, y, z) over a solid E, when evaluated in sphericalcoordinates, becomes∫ ∫ ∫

Ef(x, y, z) dV =

∫ ∫ ∫Ef(ρ sinφ cos θ, ρ sinφ sin θ, ρ cosφ) ρ2 sinφdρ dθ dφ.

Example We wish to compute the volume of the solid E in the first octantbounded below by the plane z = 0 and the hemisphere x2 + y2 + z2 = 9,bounded above by the hemisphere x2 + y2 + z2 = 16, and the planes y = 0and y = x. This would be highly inconvenient to attempt to evaluate in

2.7. TRIPLE INTEGRALS IN SPHERICAL COORDINATES 95

Cartesian coordinates; determining the limits in z alone requires breakingup the integral with respect to z. However, in spherical coordinates, thesolid E is determined by the inequalities

3 ≤ ρ ≤ 4, 0 ≤ θ ≤ π

4, 0 ≤ φ ≤ π

2.

That is, the solid is actually a “spherical rectangle”. It follows that thevolume V is given by the iterated integral

V =

∫ π/2

0

∫ π/4

0

∫ 4

3ρ2 sinφdρ dθ dφ

=π

4

∫ π/2

0

∫ 4

3ρ2 sinφdρ dθ dφ

=π

4

∫ π/2

0sinφ

∫ 4

3ρ2 dρ dθ dφ

=π

4

∫ π/2

0sinφ

ρ3

3

∣∣∣∣43

dθ dφ

=π

4

37

3

∫ π/2

0sinφdθ dφ

= −π4

37

3cosφ|π/20

=37π

12.

2

Example We use spherical coordinates to evaluate the triple integral∫ ∫ ∫H

(x2 + y2) dV,

where H is the solid that is bounded below by the xy-plane, and boundedabove by the sphere x2 + y2 + z2 = 1. In spherical coordinates, H is definedby the inequalities

H = {(ρ, θ, φ) | 0 ≤ ρ ≤ 1, 0 ≤ θ ≤ 2π, 0 ≤ φ ≤ π/2}.

As the integrand x2+y2 is equal to (ρ cos θ sinφ)2+(ρ sin θ sinφ)2 = ρ2 sin2 φin spherical coordinates, we have∫ ∫ ∫

H(x2 + y2) dV =

∫ 2π

0

∫ π/2

0

∫ 1

0(ρ2 sin2 φ)ρ2 sinφdρ dφ dθ.


Evaluating this integral, we obtain∫ ∫ ∫H

(x2 + y2) dV =

∫ 2π

0

∫ π/2

0sin3 φ

∫ 1

0ρ4 dρ dφ dθ

=

∫ 2π

0

∫ π/2

0sin3 φ

ρ5

5

∣∣∣∣10

dφ dθ

=1

5

∫ 2π

0

∫ π/2

0sin3 φdφ dθ

=1

5

∫ 2π

0

∫ π/2

0sin2 φ sinφdφ dθ

=1

5

∫ 2π

0

∫ π/2

0(1− cos2 φ) sinφdφ dθ

=1

5

∫ 2π

0

∫ π/2

0sinφdφ dθ − 1

5

∫ 2π

0

∫ π/2

0cos2 φ sinφdφ dθ

=1

5

∫ 2π

0(− cosφ)|π/20 dθ − 1

5

∫ 2π

0

∫ 1

0u2 du dθ

=1

5

∫ 2π

01 dθ − 1

5

∫ 2π

0

u3

3

∣∣∣∣10

dθ

=1

5

(2π − 2π

3

)=

4π

15.

2

2.8 Change of Variables in Multiple Integrals

Recall that in single-variable calculus, if the integral∫ b

af(u) du

is evaluated by making a change of variable u = g(x), such that the intervalα ≤ x ≤ β is mapped by g to the interval a ≤ u ≤ b, then∫ b

af(u) du =

∫ β

αf(g(x))g′(x) dx.

The appearance of the factor g′(x) in the integrand is due to the factthat if we divide [a, b] into n subintervals [ui−1, ui] of equal width ∆u =

2.8. CHANGE OF VARIABLES IN MULTIPLE INTEGRALS 97

(b−a)/n, and if we divide [α, β] into n subintervals [xi−1, xi] of equal width∆x = (β − α)/n, then

∆u = ui − ui−1 = g(xi)− g(xi−1) = g′(x∗i )∆x,

where xi−1 ≤ x∗i ≤ xi. We will now generalize this change of variable tomultiple integrals.

For simplicity, suppose that we wish to evaluate the double integral∫ ∫Df(x, y) dA

by making a change of variable

x = g(u, v), y = h(u, v), a ≤ u ≤ b, c ≤ v ≤ d.

We divide the interval [a, b] into n subintervals [ui−1, ui] of equal width ∆u =(b − a)/n, and we divide [c, d] into m subintervals [vi−1, vi] of equal width∆v = (d− c)/m. Then, the rectangle [ui−1, ui]× [vi−1, vi] is approximatelymapped by g and h into a parallelogram with adjacent sides

ru = 〈g(ui, vi−1)− g(ui−1, vi−1), h(ui, vi−1)− h(ui−1, vi−1)〉,

rv = 〈g(ui−1, vi)− g(ui−1, vi−1), h(ui−1, vi)− h(ui−1, vi−1)〉.

By the Mean Value Theorem, we have

ru ≈ 〈gu(ui−1, vi−1), hu(ui−1, vi−1)〉∆u,rv ≈ 〈gv(ui−1, vi−1), hv(ui−1, vi−1)〉∆v.

The area of this parallelogram is given by

|ru × rv| =∣∣∣∣∂g∂u ∂h∂v − ∂g

∂v

∂h

∂u

∣∣∣∣∆u∆v.

It follows that∫ ∫Df(x, y) dx dy =

∫ ∫Df(g(u, v), h(u, v))

∣∣∣∣∂(x, y)

∂(u, v)

∣∣∣∣ du dv,where D = [a, b]× [c, d] is the domain of g and h, and

∂(x, y)

∂(u, v)=

∣∣∣∣ ∂x∂u

∂x∂v

∂y∂u

∂y∂v

∣∣∣∣ =∂x

∂u

∂y

∂v− ∂x

∂v

∂y

∂u


is the Jacobian of the transformation from (u, v) to (x, y). It is also thedeterminant of the Jacobian matrix of the vector-valued function that maps(u, v) to (x, y).

Example Let D be the parallelogram with vertices (0, 0), (2, 4), (6, 1), and(8, 5). To integrate a function f(x, y) over D, we can use a change of variable(x, y) = (g(u, v), h(u, v)) that maps a rectangle to this parallelogram, andthen integrate over the rectangle.

Using the vertices, we find that the equations of the edges are

−x+ 6y = 0, −x+ 6y = 22, 2x− y = 0, 2x− y = 11.

Therefore, if we define the new variables u and v by the equations

u = −x+ 6y, v = 2x− y,

then, for (x, y) ∈ D, we have (u, v) belonging to the rectangle 0 ≤ u ≤ 22,0 ≤ v ≤ 11.

To rewrite an integral over D in terms of u and v, it is much easier toexpress the original variables in terms of the new variables than the otherway around. Therefore, we need to solve the equations defining u and v forx and y. From the equation for u, we have x = 6y − u. Substituting intothe equation for v, we obtain v = 2(6y − u)− y, which yields y = h(u, v) =111(2u + v). Subtituting this into the equation for u yields x = g(u, v) =111(u+ 6v).

The Jacobian of this transformation is

∂(x, y)

∂(u, v)=

∣∣∣∣ ∂x∂u

∂x∂v

∂y∂u

∂y∂v

∣∣∣∣ =∂x

∂u

∂y

∂v− ∂x

∂v

∂y

∂u=

1

112[1(1)− 6(2)] = − 1

11.

We conclude that∫ ∫Df(x, y) dx dy =

1

11

∫ ∫Df(g(u, v), h(u, v)) du dv.

2

In general, when integrating a function f(x1, x2, . . . , xn) over a regionD ⊂ Rn, if the integral is evaluated using a change of variable (x1, x2, . . . , xn) =g(u1, u2, . . . , un) that maps a region E ⊂ Rn to D, then∫Df(x1, . . . , xn) dx1 · · · dxn =

∫E

(f◦g)(u1, . . . , un)|det(Jg(u1, . . . , un))| du1 · · · dun,


where

Jg(u1, u2, . . . , un) =

∂x1∂u1

∂x1∂u2

· · · ∂x1∂un

∂x2∂u1

∂x2∂u2

· · · ∂x1∂un

......

...∂xn∂u1

∂xn∂u2

· · · ∂xn∂un

is the Jacobian matrix of g and det(Jg(u1, u2, . . . , un)) is its determinant,which is simply referred to as the Jacobian of the transformation g.

Example Consider the transformation from spherical to Cartesian coordi-nates,

x = ρ sinφ cos θ, y = ρ sinφ sin θ, z = ρ cosφ.

Then, the Jacobian matrix of this transformation is∂x∂ρ

∂x∂θ

∂x∂φ

∂y∂ρ

∂y∂θ

∂y∂φ

∂z∂ρ

∂z∂θ

∂z∂φ

=

sinφ cos θ −ρ sinφ sin θ ρ cosφ cos θsinφ sin θ ρ sinφ cos θ ρ cosφ sin θ

cosφ 0 − sinφ

.It follows that the Jacobian of this transformation is given by the determi-nant of this matrix,∣∣∣∣∣∣

sinφ cos θ −ρ sinφ sin θ ρ cosφ cos θsinφ sin θ ρ sinφ cos θ ρ cosφ sin θ

cosφ 0 −ρ sinφ

∣∣∣∣∣∣ = cosφ

∣∣∣∣ −ρ sinφ sin θ ρ cosφ cos θρ sinφ cos θ ρ cosφ sin θ

∣∣∣∣−ρ sinφ

∣∣∣∣ sinφ cos θ −ρ sinφ sin θsinφ sin θ ρ sinφ cos θ

∣∣∣∣= cosφ[−ρ2 sinφ cosφ sin2 θ − ρ2 sinφ cosφ cos2 θ]−

ρ sinφ[ρ sin2 φ cos2 θ + ρ sin2 φ sin2 θ]

= −ρ2 cos2 φ sinφ− ρ2 sin2 φ sinφ

= −ρ2 sinφ.

The absolute value of the Jacobian is the factor that must be included inthe integrand when converting a triple integral from Cartesian to sphericalcoordinates. 2

Example We evaluate the double integral∫ ∫R

(x2 − xy + y2) dA,


where R is the region bounded by the ellipse x2 − xy + y2 = 2, using thechange of variables

x =√

2u−√

2/3v, y =√

2u+√

2/3v.

First, we compute the Jacobian of the change of variables,

∂(x, y)

∂(u, v)= det

([ ∂x∂u

∂x∂v

∂y∂u

∂y∂v

])= det

([ √2 −

√2/3√

2√

2/3

])=√

2√

2/3+√

2√

2/3 =4√3.

Next, we need to define the region R in terms of u and v. Rewriting theequation x2−xy+y2 = 2 in terms of u and v yields the equation 2u2 +2v2 =2. It follows that the change of variables maps the region R to R, where Ris the unit disk. If we then use polar coordinates u = r cos θ and v = r sin θ,we have∫ ∫

R(x2−xy+y2) dA =

∫ ∫R

(2u2+2v2)4√3du dv =

4√3

∫ 2π

0

∫ 1

0(2r2)r dr dθ.

Evaluating this integral, we obtain∫ ∫R

(x2 − xy + y2) dA =8√3

∫ 2π

0

∫ 1

0r3 dr dθ

=8√3

∫ 2π

0

r4

4

∣∣∣∣10

dθ

=2√3

∫ 2π

01 dθ

=4π√

3.

2

Example We wish to use an appropriate change of variable to evaluate thedouble integral ∫ ∫

R(x+ y)ex

2−y2 dA,

where R is the rectangle enclosed by the lines x−y = 0, x−y = 2, x+y = 0and x+ y = 3. If we define u = x+ y and v = x− y, then R is mapped bythis change of variables to the rectangle

R = {(u, v) | 0 ≤ u ≤ 3, 0 ≤ v ≤ 2}.


Solving for x and y in terms of u and v, we obtain

x =1

2(u+ v), y =

1

2(u− v).

It follows that

∂(x, y)

∂(u, v)=∂x

∂u

∂y

∂v− ∂x

∂v

∂y

∂u=

1

2

(−1

2

)− 1

2

1

2= −1

2

and the integral becomes∫ ∫R

(x+y)ex2−y2 dA =

∫ ∫R

(x+y)e(x+y)(x−y) dA =

∫ 3

0

∫ 2

0ueuv

∣∣∣∣−1

2

∣∣∣∣ dv du.Evaluating this integral, we obtain∫ ∫

R(x+ y)ex

2−y2 dA =1

2

∫ 3

0

∫ 2

0ueuv dv du

=1

2

∫ 3

0euv|20 du

=1

2

∫ 3

0[e2u − 1] du

=1

2

[e2u

2− u]∣∣∣∣3

0

=1

2

(e6

2− 3− 1

2

)=

1

4(e6 − 7).

2

Chapter 3

Vector Calculus

3.1 Vector Fields

To this point, we have mostly worked with scalar-valued functions of severalvariables, in the interest of computing quantities such as the maximum orminimum value of a function, or the volume or center of mass of a solid.Now, we will study applications involving vector-valued functions of severalvariables. The difficulty of visualizing such functions leads to the notion ofa vector field.

A function F : U ⊆ Rn → Rn is a function that assigns to each pointx ∈ U a vector

F(x) = 〈F1(x), F2(x), . . . , Fn(x)〉in Rn. The functions F1, F2, . . . , Fn are the component functions, or com-ponent scalar fields, of F. For our purposes, n = 2 or 3. To visualize avector field, one can plot the vector F (x) at any given point x, using thecomponent functions to obtain the components of the vector to be plottedat each point.

The following are certain vector fields of interest in applications:

• Given a fluid, for example, a velocity field is a vector field V(x, y, z)that indicates the velocity of the fluid at each point (x, y, z). Whenplotting a velocity field, the speed of the fluid at each point is indicatedby the length of the vector plotted at that point, and the direction ofthe fluid at that point is indicated by the direction of the vector.

A curve c(t) is said to be a flow line, or streamline, of a velocity fieldV if, for each value of the parameter t,

c′(t) = V(c(t)).

103

104 CHAPTER 3. VECTOR CALCULUS

Figure 3.1: The vector field V(x, y) = 〈−y, x〉

That is, at each point along the curve, its tangent vector coincides withV. A flow line can be approximated by first choosing an initial pointx0 = c(t0), then using the value of V at that point to approximate asecond point x1 = c(t1) as follows:

x1 − x0

t1 − t0=

c(t1)− c(t0)

t1 − t0≈ V(c(t0)) =⇒ x1 ≈ x0 +(t1−t0)V(x0).

This can be continued to obtain the locations of any number of pointsalong the flow line. The closer the times t0, t1, . . . are to one another,the more accurate the approximate flow line will be.

• Consider two objects with mass m and M , with the object of mass Mlocated at the origin, and the vector field F defined by

F(r) = −mMG

‖r‖3r,

3.1. VECTOR FIELDS 105

where r is a position vector of the object of mass m, and G is thegravitational constant. This vector field indicates the gravitationalforce exerted by the object at the origin on the object at position r,and is therefore an example of a gravitational field.

• Suppose an electric charge Q is located at the origin, and a charge qis located at the point with position vector x. Then the electric forceexerted by the first charge on the second is given by the vector field

F (x) =εqQ

‖x‖3x,

where ε is a constant. This field, and the gravitational field describedabove, are both examples of force fields.

Figure 3.2: The conservative vector field F(x, y) = 〈y, x〉

• A vector field F is said to be conservative if F = ∇f for some functionf . We also say that F is a gradient field, and f is a potential function for


F. When we discuss line integrals, we will learn the physical meaningof a conservative vector field.

In upcoming sections we will learn how to integrate vector fields, as wellas the physical interpretations of such integrals.

Example Consider the velocity field V(x, y) = 〈−y, x〉. It is shown in Figure3.1. It can be seen from the figure that the flow lines of this velocity fieldare circles centered at the origin. 2

Example The vector field F(x, y) = 〈y, x〉 is conservative, because F = ∇f ,where f(x, y) = xy. The field is shown in Figure 3.2. It should be noted thatconservative vector fields are also called irrotational; a fluid whose velocityfield is conservative has no vorticity. 2

3.2 Line Integrals

Recall from single-variable calclus that if a constant force F is applied toan object to move it along a straight line from x = a to x = b, then theamount of work done is the force times the distance, W = F (b − a). Moregenerally, if the force is not constant, but is instead dependent on x so thatthe amount of force applied when the object is at the point x is given byF (x), then the work done is given by the integral

W =

∫ b

aF (x) dx.

This result is obtained by applying the “basic” formula for work along eachof n subintervals of width ∆x = (b− a)/n, and taking the limit as ∆x→ 0.

Now, suppose that a force is applied to an object to move it along a pathtraced by a curve C, instead of moving it along a straight line. If the amountof force that is being applied to the object at any point p on the curve C isgiven by the value of a function F (p), then the work can be approximated by,as before, applying the “basic” formula for work to each of n line segmentsthat approximate the curve and have lengths ∆s1,∆s2, . . . ,∆sn. The workdone on the ith segment is approximately F (p∗i )∆si, where p∗i is any pointon the segment. By taking the limit as max ∆si → 0, we obtain the lineintegral

W =

∫CF (p) ds = lim

max ∆si→0

n∑i=1

F (p∗i ) ∆si,

provided that this limit exists.

3.2. LINE INTEGRALS 107

In order to actually evaluate a line integral, it is necessary to expressthe curve C in terms of parametric equations. For concreteness, we assumethat C is a plane curve defined by the parametric equations

x = x(t), y = y(t), a ≤ t ≤ b.

Then, if we divide [a, b] into subintervals of width ∆t = (b−a)/n, with end-points [ti−1, ti] where ti = a+i∆t, we can approximate C by n line segmentswith endpoints (x(ti−1), y(ti−1)) and (x(ti), y(ti)), for i = 1, 2, . . . , n. Fromthe Pythagorean Theorem, it follows that the ith segment has length

∆si =√

∆x2i + ∆y2

i =

√(∆xi∆t

)2

+

(∆yi∆t

)2

∆t,

where ∆xi = x(ti)− x(ti−1) and ∆yi = y(ti)− y(ti−1). Letting ∆t→ 0, weobtain ∫

CF (p) ds =

∫ b

aF (x(t), y(t))

√(dx

dt

)2

+

(dy

dt

)2

dt.

We recall that if F (x, y) ≡ 1, then this integral yields the arc length of thecurve C.

Example (Stewart, Section 13.2, Exercise 8) To evaluate the line integral∫Cx2z ds

where C is the line segment from (0, 6,−1) to (4, 1, 5), we first need paramet-ric equations for the line segment. Using the vector between the endpoints,

v = 〈4− 0, 1− 6, 5− (−1)〉 = 〈4,−5, 6〉,

we obtain the parametric equations

x = 4t, y = 6− 5t, z = −1 + 6t, 0 ≤ t ≤ 1.

It follows that∫Cx2z ds =

∫ 1

0(x(t))2z(t)

√[x′(t)]2 + [y′(t)]2 + [z′(t)]2 dt

=

∫ 1

0(4t)2(6t− 1)

√42 + (−5)2 + 62 dt

=

∫ 1

016t2(6t− 1)

√77 dt


= 16√

77

∫ 1

06t3 − t2 dt

= 16√

77

(6t4

4− t3

3

)∣∣∣∣10

= 16√

77

(3

2− 1

3

)=

56√

77

3.

2

Example (Stewart, Section 13.2, Exercise 10) We evaluate the line integral∫C

(2x+ 9z) ds

where C is defined by the parametric equations

x = t, y = t2, z = t3, 0 ≤ t ≤ 1.

We have∫C

(2x+ 9z) ds =

∫ 1

0(2x(t) + 9z(t))

√[x′(t)]2 + [y′(t)]2 + [z′(t)]2 dt

=

∫ 1

0(2t+ 9t3)

√12 + (2t)2 + (3t2)2 dt

=

∫ 1

0(2t+ 9t3)

√1 + 4t2 + 9t4 dt

=1

4

∫ 14

1u1/2 du, u = 1 + 4t2 + 9t4

=1

4

2

3u3/2

∣∣∣∣14

1

=1

6(143/2 − 1).

2

Although we have introduced line integrals in the context of computingwork, this approach can be used to integrate any function along a curve. Forexample, to compute the mass of a wire that is shaped like a plane curve C,where the density of the wire is given by a function ρ(x, y) defined at each


point (x, y) on C, we can evaluate the line integral

m =

∫Cρ(x, y) ds.

It follows that the center of mass of the wire is the point (x, y) where

x =1

m

∫Cxρ(x, y) ds, y =

1

m

∫Cyρ(x, y) ds.

Now, suppose that a vector-valued force F is applied to an object tomove it along the path traced by a plane curve C. If we approximate thecurve by line segments, as before, the work done along the ith segment isapproximately given by

Wi = F(p∗i ) · [T(p∗i )∆si]

where p∗i is a point on the segment, and T(p∗i ) is the unit tangent vectorto the curve at this point. That is, F ·T = ‖F‖ cos θ is the amount of forcethat is applied to the object at each point on the curve, where θ is the anglebetween F and the direction of the curve, which is indicated by T. In thelimit as max ∆si → 0, we obtain the line integral of F along C,∫

CF ·T ds.

If the curve C is parametrized by the the vector equation r(t) = 〈x(t), y(t)〉,where a ≤ t ≤ b, then the tangent vector is parametrized by

T(t) = r′(t)/‖r′(t)‖,

and, as before, ds =√

[x′(t)]2 + [y′(t)]2 dt = ‖r′(t)‖ dt. It follows that∫CF ·T ds =

∫ b

aF(r(t)) · r′(t)

‖r′(t)‖‖r′(t)‖ dt =

∫ b

aF(r(t)) ·r′(t) dt =

∫CF · dr.

The last form of the line integral is merely an abbreviation that is used forconvenience. As with line integrals of scalar-valued functions, the paramet-ric representation of the curve is necessary for actual evaluation of a lineintegral.

Example (Stewart, Section 13.2, Exercise 20) We evaluate the line integral∫CF · dr


where F(x, y, z) = 〈z, y,−x〉 and C is the curve defined by the parametricvector equation

r(t) = 〈x(t), y(t), z(t)〉 = 〈t, sin t, cos t〉, 0 ≤ t ≤ π.

We have∫CF · dr =

∫ π

0F(r(t)) · r′(t) dt

=

∫ π

0〈z(t), y(t),−x(t)〉 · 〈x′(t), y′(t), z′(t)〉 dt

=

∫ π

0〈cos t, sin t,−t〉 · 〈1, cos t,− sin t〉 dt

=

∫ π

0[cos t+ sin t cos t+ t sin t] dt

=

∫ π

0cos t dt+

∫ π

0sin t cos t dt+

∫ π

0t sin t dt

= sin t|π0 +1

2sin2 t

∣∣∣∣π0

− t cos t|π0 +

∫ π

0cos t dt

= π.

2

If we write F(x, y) = 〈P (x, y), Q(x, y)〉, where P and Q are the compo-nent functions of F, then we have∫

CF · dr =

∫ b

aF(r(t)) · r′(t) dt

=

∫ b

a〈P (x(t), y(t)), Q(x(t), y(t))〉 · 〈x′(t), y′(t)〉 dt

=

∫ b

aP (x(t), y(t))x′(t) dt+

∫ b

aQ(x(t), y(t))y′(t) dt.

When the curve is approximated by n line segments, as before, the differencein the x-coordinates of each segment is, by the Mean Value Theorem,

∆xi = x(ti)− x(ti−1) ≈ x′(t∗i ) ∆t,

where ti−1 ≤ t∗i ≤ ti. For this reason, we write∫ b

aP (x(t), y(t))x′(t) dt =

∫CP dx,


∫ b

aQ(x(t), y(t))y′(t) dt =

∫CQdy,

and conclude ∫CF · dr =

∫CP dx+Qdy.

These line integrals of scalar-valued functions can be evaluated individ-ually to obtain the line integral of the vector field F over C. However, it isimportant to note that unlike line integrals with respect to the arc length s,the value of line integrals with respect to x or y (or z, in 3-D) depends on theorientation of C. If the curve is traced in reverse (that is, from the terminalpoint to the initial point), then the sign of the line integral is reversed aswell. We denote by −C the curve C with its orientation reversed. We thenhave ∫

CF · dr = −

∫−C

F · dr,

and ∫CP dx = −

∫−C

P dx,

∫CQdy = −

∫−C

Qdy.

All of this discussion generalizes to space curves (that is, curves in 3-D) ina straightforward manner, as illustrated in the examples.

Example (Stewart, Section 13.2, Exercise 6) Let F(x, y) = 〈sinx, cos y〉 andlet C be the curve that is the top half of the circle x2 + y2 = 1, traversedcounterclockwise from (1, 0) to (−1, 0), and the line segment from (−1, 0) to(−2, 3). To evaluate the line integral∫

CF ·T ds =

∫C

sinx dx+ cos y dy,

we consider the integrals over the semicircle, denoted by C1, and the linesegment, denoted by C2, separately. We then have∫

Csinx dx+ cos y dy =

∫C1

sinx dx+ cos y dy +

∫C2

sinx dx+ cos y dy.

For the semicircle, we use the parametric equations

x = cos t, y = sin t, 0 ≤ t ≤ pi.

This yields∫C1

sinx dx+ cos y dy =

∫ π

0sin(cos t)(− sin t) dt+ cos(sin t) cos t dt


= − cos(cos t)|π0 + sin(sin t)|π0= − cos(−1) + cos(1)

= 0.

For the line segment, we use the parametric equations

x = −1− t, y = 3t, 0 ≤ t ≤ 1.

This yields∫C2

sinx dx+ cos y dy =

∫ 1

0sin(−1− t)(−1) dt+ cos(3t)(3) dt

= − cos(−1− t)|10 + sin(3t)|10= − cos(−2) + cos(−1) + sin(3)− sin(0)

= − cos(2) + cos(1) + sin(3).

We conclude ∫C

sinx dx+ cos y dy = cos(1)− cos(2) + sin(3).

In evaluating these integrals, we have taken advantage of the rule∫ b

af ′(g(t))g′(t) dt = f(g(b))− f(g(a)),

from the Fundamental Theorem of Calculus and the Chain Rule. However,this shortcut can only be applied when an integral involves only one of theindependent variables. 2

Example (Stewart, Section 13.2, Exercise 12) We evaluate the line integral∫CF · dr

where

F(x, y, z) = 〈P (x, y, z), Q(x, y, z), R(x, y, z)〉 = 〈z, x, y〉,

and C is defined by the parametric equations

x = t2, y = t3, z = t2, 0 ≤ t ≤ 1.

3.3. THE FUNDAMENTAL THEOREM FOR LINE INTEGRALS 113

We have ∫CF · dr =

∫CP dx+Qdy +Rdz

=

∫ 1

0z(t)x′(t) dt+ x(t)y′(t) dt+ y(t)z′(t) dt

=

∫ 1

0t2(2t) dt+ t2(3t2) dt+ t3(2t) dt

=

∫ 1

02t3 dt+ 3t4 dt+ 2t4 dt

=

∫ 1

0(5t4 + 2t3) dt

=

(5t5

5+ 2

t4

4

)∣∣∣∣10

=3

2.

2

3.3 The Fundamental Theorem for Line Integrals

We have learned that the line integral of a vector field F over a curvepiecewise smooth C, that is parameterized by a vector-valued function r(t),a ≤ t ≤ b, is given by ∫

CF · dr =

∫ b

aF(r(t)) · r′(t) dt.

Now, suppose that F continuous, and is a conservative vector field; that is,F = ∇f for some scalar-valued function f . Then, by the Chain Rule, wehave∫CF· dr =

∫ b

a∇f(r(t))·r′(t) dt =

∫ b

a

d

dt[(f◦r)(t)] dt = (f ◦ r)(t)|ba = f(r(b))−f(r(a)).

This is the Fundamental Theorem of Line Integrals, which is a generalizationof the Fundamental Theorem of Calculus.

If the curve C is a closed curve; that is, the initial and terminal pointsof C are the same, then r(b) = r(a), which yields∫

CF · dr = f(r(b))− f(r(a)) = 0.


If we decompose C into two curves C1 and C2, and use the fact that the signof the line integral of a vector field over a curve depends on the orientationof the curve, then we have∫

CF · dr =

∫C1

F · dr +

∫C2

F · dr =

∫C1

F · dr−∫−C2

F · dr = 0.

That is, ∫C1

F · dr =

∫−C2

F · dr.

However, C1 and −C2 have the same initial and terminal points. It followsthat if F is conservative within an open, connected domain D (so that anytwo points in D can be connected by a path that lies within D), then theline integral of F is independent of path in D; that is, the value of the lineintegral of F over a path C depends only on its initial and terminal points.

The converse of this statement is also true: if the line integral of avector field F is independent of path within an open, connected domainD, then F is a conservative vector field on D. To see this, we considerthe two-variable case and let D be a region in R2. Furthermore, we letF(x, y) = 〈P (x, y), Q(x, y)〉. We choose an arbitrary point (a, b) ∈ D, anddefine

f(x, y) =

∫ (x,y)

(a,b)F · dr.

Since this line integral is independent of path, we can define f(x, y) usingany path between (a, b) and (x, y) that we choose, knowing that its value at(x, y) will be the same in any case.

By choosing a path that ends with a horizontal line segment from (x1, y)to (x, y) contained entirely in D, parametrized by x = t, y = y, for x1 ≤ t ≤x, we can show that

∂f

∂x(x, y) =

∂

∂x

[∫ (x1,y)

(a,b)F · dr

]+

∂

∂x

[∫ (x,y)

(x1,y)F · dr

]

= 0 +∂

∂x

[∫ x

x1

P (x(t), y)x′(t) dt+Q(x(t), y)y′(t) dt

]=

∂

∂x

[∫ x

x1

P (t, y) dt+ 0

]= P (x, y).

Using a similar argument, we can show that ∂f/∂y = Q. We have thusshown that F is conservative, and conclude that F is a conservative vectorfield if and only if its line integral is independent of path.


However, in order to use the Fundamental Theorem of Line Integrals toevaluate the line integral of a conservative vector field, it is necessary toobtain the function f such that ∇f = F . Furthermore, the theorem cannotbe applied to a vector field that is not conservative, so we need to be ableto confirm that a given vector field is conservative before we proceed.

Continuing to restrict ourselves to the two-variable case, suppose thatF = 〈P,Q〉 is a conservative vector field defined on a domain D, and thatP and Q have continuous first partial derivatives. Then, we have

∂f

∂x= P,

∂f

∂y= Q,

for some function f . It follows that

∂P

∂y=

∂2f

∂y∂x,

∂Q

∂x=

∂2f

∂x∂y.

However, by Clairaut’s Theorem, these mixed second partial derivatives off are equal, so it follows that

∂P

∂y=∂Q

∂x

if F = 〈P,Q〉 is conservative.If the domain D is simply connected, meaning that any region enclosed

by a closed curve in D contains only points in D (informally, D has “noholes”), then the converse is true: if

∂P

∂y=∂Q

∂x

in D, then F = 〈P,Q〉 is a conservative vector field. Similarly, if F =〈P,Q,R〉 is a vector field defined on a simply connected domain D ⊆ R3,and

∂P

∂y=∂Q

∂x,

∂P

∂z=∂R

∂x,

∂Q

∂z=∂R

∂y,

then F is conservative.It remains to be able to find the function f such that ∇f = F for a

given vector field F = 〈P,Q〉 that is known to be conservative. The generaltechnique is as follows:

• Integrate P (x, y) with respect to x to obtain

f(x, y) = f1(x, y) + g(y),


where f1(x, y) is obtained by anti-differentiation of P (x, y), and g(y) isan unknown function that plays the role of the constant of integration,since f(x, y) is obtained by anti-differentiating with respect to x.

• Differentiate f with respect to y to obtain

∂

∂y[f1(x, y)] + g′(y) = Q(x, y),

and solve for g′(y).

• Integrate g′(y) with respect to y to complete the definition of f(x, y),up to a constant of integration.

A similar procedure can be used for a vector field defined on R3, except thatthe function g depends on both y and z, and differentiation with respect toboth y and z is needed to completely define the function f(x, y, z) such that∇f = F.

Example (Stewart, Section 13.3, Exercise 14) Let

F(x, y, z) = 〈P (x, y, z), Q(x, y, z), R(x, y, z)〉 = 〈2xz + y2, 2xy, x2 + 3z2〉.

To confirm that F is conservative, we check the appropriate first partialderivatives of P , Q and R:

Py = 2y = Qx, Pz = 2x = Rx, Qz = 0 = Ry.

Now, to find a function f(x, y, z) such that ∇f = F, which must satisfyfx = P , we integrate P (x, y, z) with respect to x and obtain

f(x, y, z) = x2z + y2x+ g(y, z).

Differentiating with respect to y and z yields the equations

fy(x, y, z) = 2xy + gy(y, z) = Q(x, y, z) = 2xy,

fz(x, y, z) = x2 + gz(y, z) = R(x, y, z) = x2 + 3z2.

It follows that

gy(y, z) = 0, gz(y, z) = 3z2,

which yields

g(y, z) = z3 +K


for some constant K. We conclude that F = ∇f where

f(x, y, z) = x2z + y2x+ z3 +K

where K is an arbitrary constant.To evaluate the line integral of F over the curve C parametrized by

x = t2, y = t+ 1, z = 2t− 1, 0 ≤ t ≤ 1,

we apply the Fundamental Theorem of Line Integrals and obtain∫CF · dr = f(x(1), y(1), z(1))− f(x(0), y(0), z(0))

= f(1, 2, 1)− f(0, 1,−1)

= 12(1) + 22(1) + 13 +K − (02(−1) + 12(0) + (−1)3 +K)

= 1 + 4 + 1 +K − (0 + 0− 1 +K)

= 7.

2

Let F represent a force field. Then, recall that the work done by theforce field to move an object along a path r(t), a ≤ t ≤ b, is given by theline integral

W =

∫CF · dr =

∫ b


From Newton’s Second Law of Motion, we have

F(r(t)) = mr′′(t),

where m is the mass of the object, and r′′(t) = a(t) is its acceleration. Wethen have

W =

∫ b

amr′′(t) · r′(t) dt

=1

2m

∫ b

a

d

dt[r′(t) · r′(t)] dt

=1

2m

∫ b

a

d

dt[‖r′(t)‖2] dt

=1

2m‖v(b)‖2 − 1

2m‖v(a)‖2

where v(t) = r′(t) is the velocity of the object.


It follows that

W = K(B)−K(A),

where A = r(a) and B = r(b) are the initial and terminal points, respec-tively, and

K(P ) =1

2m‖v(t)‖, r(t) = P,

is the kinetic energy of the object at the point P . That is, the work doneby the force field along C is the change in the kinetic energy from point Ato point B.

If F is also a conservative force field, then F = −∇P , where P is thepotential energy. It follows from the Fundamental Theorem of Line Integralsthat

W =

∫CF · dr = −

∫C∇P · dr = −[P (B)− P (A)].

We conclude that

P (A) +K(A) = P (B) +K(B).

That is, when an object is moved by a conservative force field, then itstotal energy remains constant. This is known as the Law of Conservationof Energy.

3.4 Green’s Theorem

We have learned that if a vector field is conservative, then its line integralover a closed curve C is equal to zero. However, if this is not the case, thenevaluation of a line integral using the formula∫

CF · dr =

∫ b

aF(r(t)) · r′(t) dt,

where r(t) is a parameterization of C, can be very difficult, even if C is arelatively simple plane curve. Fortunately, in this case, there is an alternativeapproach, using a result known as Green’s Theorem.

We assume that F = 〈P,Q〉, and consider the case where C encloses aregion D that can be viewed as a region of either type I or type II. That is,D has the definitions

D = {(x, y) | a ≤ x ≤ b, g1(x) ≤ y ≤ g2(x)}

3.4. GREEN’S THEOREM 119

andD = {(x, y) | c ≤ y ≤ d, h1(y) ≤ x ≤ h2(y)}.

Using the first definition, we have C = C1 ∪ C2 ∪ (−C3) ∪ (−C4), where:

• C1 is the curve with parameterization x = t, y = g1(t), for a ≤ t ≤ b

• C2 is the vertical line segment with parameterization x = b, y = t, forg1(b) ≤ t ≤ g2(b)

• C3 is the curve with parameterization x = t, y = g2(t), for a ≤ t ≤ b

• C4 is the vertical line segment with parameterization x = a, y = t, forg1(a) ≤ t ≤ g2(a)

We use positive orientation to describe the curve C, which means that thecurve is traversed counterclockwise. This means that as the curve is tra-versed, the region D is “on the left”.

In view of ∫CF · dr =

∫CP dx+Qdy,

we have∫CP dx =

∫C1

P dx+

∫C2

P dx+

∫−C3

P dx+

∫−C4

P dx

=

∫C1

P dx+

∫C2

P dx−∫C3

P dx−∫C4

P dx

=

∫ b

aP (x(t), y(t))x′(t) dt+

∫ g2(b)

g1(b)P (x(t), y(t))x′(t) dt−∫ b

aP (x(t), y(t))x′(t) dt−

∫ g2(a)

g1(a)P (x(t), y(t))x′(t) dt

=

∫ b

aP (t, g1(t))(1) dt+

∫ g2(b)

g1(b)P (b, t)(0) dt−∫ b

aP (t, g2(t))(1) dt−

∫ g2(a)

g1(a)P (a, t)(0) dt

=

∫ b

a[P (t, g1(t))− P (t, g2(t))] dt

= −∫ b

a

∫ g2(t)

g1(t)Py(t, y) dy dt

= −∫ ∫

D

∂P

∂ydA.


Using a similar approach in which D is viewed as a region of type II, weobtain ∫

CQdy =

∫ ∫D

∂Q

∂xdA.

Putting these results together, we obtain Green’s Theorem, which statesthat if C is a positively oriented, piecewise smooth, simple (that is, notself-intersecting) closed curve that encloses a region D, and P and Q arefunctions that have continuous first partial derivatives on D, then∫

CP dx+Qdy =

∫ ∫D

(∂Q

∂x− ∂P

∂y

)dA.

Another common statement of the theorem is∫ ∫D

(∂Q

∂x− ∂P

∂y

)dA =

∫∂D

P dx+Qdy,

where ∂D denotes the positively oriented boundary of D.This theorem can be used to find a simpler approach to evaluating a

line integral of the vector field 〈P,Q〉 over C by converting the integral toa double integral over D, or it can be used to find a simpler approach toevaluating a double integral over a region D by converting it into an integralover its boundary.

To show that Green’s Theorem applies for more general regions thanthose that are of both type I and type II, we consider a region D that isthe union of two regions D1 and D2 that are of both type I and type II. LetC be the positively oriented boundary of D, let D1 have positively orientedboundary C1∪C3, and let D2 have positively oriented boundary C2∪(−C3),where C3 is the boundary between D1 and D2. Then, C = C1∪C2. It followsthat for functions P and Q that satisfy the assumptions of Green’s Theoremon D, we can apply the theorem to D1 and D2 individually to obtain∫ ∫

D

(∂Q

∂x− ∂P

∂y

)dA =

∫ ∫D1

(∂Q

∂x− ∂P

∂y

)dA+∫ ∫

D2

(∂Q

∂x− ∂P

∂y

)dA

=

∫C1∪C3

P dx+Qdy +

∫C2∪(−C3)

P dx+Qdy

=

∫C1

P dx+Qdy +

∫C3

P dx+Qdy +∫C2

P dx+Qdy +

∫−C3

P dx+Qdy

3.4. GREEN’S THEOREM 121

=

∫C1

P dx+Qdy +

∫C2

P dx+Qdy +∫C3

P dx+Qdy −∫C3

P dx+Qdy

=

∫C1

P dx+Qdy +

∫C2

P dx+Qdy

=

∫C1∪C2

P dx+Qdy

=

∫CP dx+Qdy.

We conclude that Green’s Theorem holds on D1 ∪D2. The same argumentcan be used to easily show that Green’s Theorem applies on any finite unionof simple regions, which are regions of both type I and type II.

Green’s Theorem can also be applied to regions with “holes”, that is,regions that are not simply connected. To see this, let D be a region enclosedby two curves C1 and C2 that are both positively oriented with respect to D(that is, D is on the left as either C1 or C2 is traversed). Let C2 be containedwithin the region enclosed by C1; that is, let C2 be the boundary of the“hole” in D. Then, we can decompose D into two simply connected regionsD′ and D′′ by connecting C2 to C1 along two separate curves that lie withinD. Applying Green’s Theorem to D′ and D′′ individually, we find that theline integrals along the common boundaries of D′ and D′′ cancel, becausethey have opposite orientations with respect to these regions. Therefore, wehave ∫ ∫

D

(∂Q

∂x− ∂P

∂y

)dA =

∫ ∫D′

(∂Q

∂x− ∂P

∂y

)dA+∫ ∫

D′′

(∂Q

∂x− ∂P

∂y

)dA

=

∫C1

P dx+Qdy +

∫C2

P dx+Qdy

=

∫C1∪C2

P dx+Qdy.

Therefore, Green’s Theorem applies to D as well.

Example The vector field

F(x, y) = 〈P (x, y), Q(x, y)〉 =

⟨− y

x2 + y2,

x

x2 + y2

⟩


is conservative on all of R2 except at the origin, because it is not definedthere. Specifically, F = ∇f where

f(x, y) = tan−1 y

x.

Now, consider a region D that is enclosed by a positively oriented, piecewisesmooth, simple closed curve C, and also has a “hole” that is a disk ofradius a, centered at the origin, and contained entirely within C. Let C ′ bethe positively oriented boundary of this disk. Then, the boundary of D isC∪ (−C ′), because, as a portion of the boundary of D, rather than the disk,it is necessary for C ′ to switch orientation. Applying Green’s Theorem tocompute the line integral of F over the boundary of D yields∫

CP dx+Qdy +

∫−C′

P dx+Qdy =

∫ ∫D

(∂Q

∂x− ∂P

∂y

)dA = 0,

since F is conservative on D. It follows that∫CF · dr = −

∫−C′

F · dr =

∫C′

F · dr,

so we can compute the line integral of F over C, which we have not specified,by computing the line integral over the circle C ′, which can be parameterizedby x = a cos t, y = a sin t, for 0 ≤ t ≤ 2π. This yields∫

C′F · dr =

∫ 2π

0P (x(t), y(t))x′(t) dt+Q(x(t), y(t))y′(t) dt

=

∫ 2π

0

(− a sin t

(a cos t)2 + (a sin t)2

)(−a sin t) dt+(

a cos t

(a cos t)2 + (a sin t)2

)(a cos t) dt

=

∫ 2π

0

a2 sin2 t

a2 cos2 +a2 sin2 tdt+

a2 cos2 t

a2 cos2 t+ a2 sin2 tdt

=

∫ 2π

01 dt

= 2π.

We conclude that the line integral of F over any positively oriented, piecewisesmooth, simple closed curve that encloses the origin is equal to 2π. 2

Example Consider a n-sided polygon P with vertices (x1, y1), (x2, y2), . . .,(xn, yn). The area A of the polygon is given by the double integral

A =

∫ ∫P

1 dA.

3.5. CURL AND DIVERGENCE 123

Let P (x, y) = −y/2 and Q(x, y) = x/2. Then(∂Q

∂x− ∂P

∂y

)=

(1

2−(−1

2

))= 1.

It follows from Green’s Theorem that if ∂P is positively oriented, then

A =

∫∂PQdy + P dx =

1

2

∫∂Px dy − y dx.

To evaluate this line integral, we consider each edge of P individually. LetC be the line segment from (x1, y1) to (x2, y2), and assume, for convenience,that C is not vertical. Then C can be parameterized by x = t, y = mx+ b,for x1 ≤ x ≤ x2, where

m =y2 − y1

x2 − x1, b = y1 −mx1.

We then have∫Cx dy − y dx =

∫ x2

x1

mtdt− (mt+ b) dt

= −∫ x2

x1

b dt

= b(x1 − x2)

= y1(x1 − x2)−mx1(x1 − x2)

= y1(x1 − x2) + (y2 − y1)x1

= x1y2 − x2y1.

We conclude that

A =1

2[(x1y2 − x2y1) + (x2y3 − x3y2) + · · ·+ (xn−1yn − xnyn−1)+

(xny1 − x1yn)] .

2

3.5 Curl and Divergence

We have seen two theorems in vector calculus, the Fundamental Theorem ofLine Integrals and Green’s Theorem, that relate the integral of a set to anintegral over its boundary. Before establishing similar results that apply to


surfaces and solids, it is helpful to introduce new operations on vector fieldsthat will simplify exposition.

We have previously learned that a vector field F = 〈P,Q,R〉 defined onR3 is conservative if

Ry −Qz = 0, Pz −Rx = 0, Qx − Py = 0.

These equations are equivalent to the statement⟨∂

∂x,∂

∂y,∂

∂z

⟩× 〈P,Q,R〉 = 〈0, 0, 0〉.

Therefore, we define the curl of a vector field F = 〈P,Q,R〉 by

curlF = ∇× F,

where

∇ =

⟨∂

∂x,∂

∂y,∂

∂z

⟩.

From the definition of a conservative vector field, it follows that curlF = 0 ifF = ∇f where f has continuous second partial derivatives, due to Clairaut’sTheorem. That is, the curl of a gradient is zero.

This is equivalent to the statement that the curl of a conservative vectorfield is zero. The converse, that a vector field F for which curlF = 0 isconservative, is also true if F has continuous first partial derivatives andcurlF = 0 within a simply connected domain. That is, the domain must nothave “holes”.

When F represents the velocity field of a fluid, the fluid tends to rotatearound the axis that is aligned with curlF, and the magnitude of curlFindicates the speed of rotation. Therefore, when curlF = 0, we say that Fis irrotational, which is a term that has previously been associated with theequivalent condition of F being conservative.

Another operation that is useful for discussing properties of vector fieldsis the divergence of a vector field F, denoted by divF. It is defined by

divF = ∇ · F.

For example, if F = 〈P,Q,R〉, then

divF =

⟨∂

∂x,∂

∂y,∂

∂z

⟩· 〈P,Q,R〉 = Px +Qy +Rz.

3.5. CURL AND DIVERGENCE 125

Unlike the curl, the divergence is defined for vector fields with any number ofvariables, as long as the number of independent and the number of dependentvariables are the same.

It can be verified directly that if F is the curl of a vector field G, thendivF = 0. That is, the divergence of any curl is zero, as long as G hascontinuous second partial derivatives. This is useful for determining whethera given vector field F is the curl of any other vector field G, for if it is, itsdivergence must be zero.

Example (Stewart, Section 13.5, Exercise 18) The vector field F(x, y, z) =〈yz, xyz, xy〉 is not the curl of any vector field G, because

divF = (yz)x + (xyz)y + (xy)z = 0 + xz + 0 = xz,

whereas if F = curlG, then

divF = div curlG = 0.

2

If F represents the velocity field of a fluid, then, at each point within thefluid, divF measures the tendency of the fluid to diverge away from thatpoint. Specifically, the divergence is the rate of change, with respect to time,of the density of the fluid. Therefore, if divF = 0, then we say that F, andtherefore the fluid as well, is incompressible.

The divergence of a gradient is

div(∇f) = ∇ · ∇f =

⟨∂

∂x,∂

∂y,∂

∂z

⟩·⟨∂f

∂x,∂f

∂y,∂f

∂z

⟩=∂2f

∂x2+∂2f

∂y2+∂2f

∂z2.

We denote this expression ∇ ·∇f by ∇2f , or ∆f , which is called the Lapla-cian of f . The operator ∇2 is called the Laplace operator. Its name comesfrom Laplace’s equation

∆f = 0.

The curl and divergence can be used to restate Green’s Theorem informs that are more directly generalizable to surfaces and solids in R3. LetF = 〈P,Q, 0〉, the embedding of a two-dimensional vector field in R3. Then

curlF =

(∂Q

∂x− ∂P

∂y

)k,

where, as before, k = 〈0, 0, 1〉. It follows that

curlF · k =

(∂Q

∂x− ∂P

∂y

)k · k =

(∂Q

∂x− ∂P

∂y

).


This expression is called the scalar curl of the two-dimensional vector field〈P,Q〉. We conclude that Green’s Theorem can be rewritten as∫

CF dr =

∫ ∫D

(curlF) · k dA.

Another useful form of Green’s Theorem involves the divergence. LetF = 〈P,Q〉 have continuous first partial derivatives in a domain D witha positively oriented, piecewise smooth boundary C that has parametriza-tion r(t) = 〈x(t), y(t)〉, for a ≤ t ≤ b. Using the original form of Green’sTheorem, we have∫ ∫

DdivF dA =

∫ ∫D

(∂P

∂x+∂Q

∂y

)dA

=

∫CP dy −Qdx

=

∫ b

aP (x(t), y(t))y′(t) dt−Q(x(t), y(t))x′(t) dt

=

∫ b

a

[P (x(t), y(t))

y′(t)

‖r′(t)‖+Q(x(t), y(t))

−x′(t)‖r′(t)‖

]‖r′(t)‖ dt

=

∫ b

a(F · n)(t)‖r′(t)‖ dt

=

∫CF · n ds

where

n(t) =1

‖r′(t)‖〈y′(t),−x′(t)〉

is the outward unit normal vector to the curve C. Note that n ·T = 0, whereT is the unit tangent vector

T(t) =1

‖r′(t)‖〈x′(t), y′(t)〉.

We have established a third form of Green’s Theorem,∫CF · n ds =

∫ ∫D

divF dA.

3.6. PARAMETRIC SURFACES AND THEIR AREAS 127

3.6 Parametric Surfaces and Their Areas

We have learned that Green’s Theorem can be used to relate a line integralof a two-dimensional vector field F over a closed plane curve C to a doubleintegral of a component of curl F over the region D that is enclosed byC. Our goal is to generalize this result in such a way as to relate the lineintegral of a three-dimensional vector field F over a closed space curve C tothe integral of a component of curl F over a surface enclosed by C.

We have also learned that Green’s Theorem relates the integral of thenormal component of a two-dimensional vector field over a closed curve C tothe double integral of div F over the region D that it encloses. We wish togeneralize this result in order to relate the integral of the normal componentof a three-dimensional vector field F over a closed surface S to the tripleintegral of div F over the solid E contained within S.

In order to realize either of these generalizations, we need to be able to in-tegrate functions over piecewise smooth surfaces, just as we now know how tointegrate functions over piecewise smooth curves. Whereas a smooth curveC, being a curved one-dimensional entity, is most conveniently described bya parameterization r(t), where a ≤ t ≤ b and r(t) is a differentiable functionof one variable, a smooth surface S, being a curved two-dimensional entity,is most conveniently described by a parametrization r(u, v), where (u, v) lieswithin a 2-D region, and r(u, v) = 〈x(u, v), y(y, v), z(u, v)〉 is a differentiablefunction of two variables. We say that S is a parametric surface, and

x = x(u, v), y = y(u, v), z = z(u, v)

are the parametric equations of S.

Example The Mobius strip is a surface that is famous for being a nonori-entable surface; that is, it “has only one side”. It can be parameterizedby

x(u, v) =(

1 +v

2cos

u

2

)cosu,

y(u, v) =(

1 +v

2cos

u

2

)sinu,

z(u, v) =v

2sin

u

2,

where 0 ≤ u ≤ 2π and −1 ≤ v ≤ 1. It is shown in Figure 3.3. 2

Example The paraboloid defined by the equation x = 4y2 +4z2, 0 ≤ x ≤ 4,


Figure 3.3: The Mobius strip

can also be defined by the parametric equations

x = x, y =

√x

2cos θ, z =

√x

2sin θ,

where 0 ≤ θ ≤ 2π, since for each x, a point (x, y, z) on the paraboloid mustlie on a circle centered at (x, 0, 0) with radius

√x/4, parallel to the yz-plane.

This is an example of a surface of revolution, since the surface is obtainedby revolving the curve y = f(x) around the x-axis. 2

Let P0 = (x0, y0, z0) = r(u0, v0) be a point on a parametric surface S. Acurve defined by g(v) = r(u0, v) that lies within S and passes through P0

has the tangent vector

rv = g′(v) =

⟨∂x

∂v(u0, v0),

∂y

∂v(u0, v0),

∂z

∂v(u0, v0)

⟩at P0. Similarly, the tangent vector at P0 of the curve h(u) = r(u, v0), thatalso lies within S and passes through P0, is

ru = h′(u) =

⟨∂x

∂u(u0, v0),

∂y

∂u(u0, v0),

∂z

∂u(u0, v0)

⟩.


If these vectors are not parallel, then together they define the tangent planeof S at P0. Its normal vector is

ru × rv = 〈a, b, c〉

which yields the equation

a(x− x0) + b(y − y0) + c(z − z0) = 0

of the tangent plane.

Example (Stewart, Section 13.6, Exercise 30) Consider the surface definedby the parametric equations

x = u2, y = v2, z = uv, 0 ≤ u, v ≤ 10.

At the point (x0, y0, z0) = (1, 1, 1), which corresponds to u0 = 1, v0 = 1,the equation of the tangent plane can be obtained by first computing thepartial derivatives of the coordinate functions. We have

xu = 2u, yu = 0, zu = v,

xv = 0, yv = 2v, zv = u.

Evaluating at (u0, v0) yields

ru = 〈xu, yu, zu〉 = 〈2, 0, 1〉, rv = 〈xv, yv, zv〉 = 〈0, 2, 1〉.

It follows that the normal to the tangent plane is

n = ru × rv = 〈2, 0, 1〉 × 〈0, 2, 1〉 = 〈−2,−2, 4〉.

We conclude that the equation of the tangent plane is

−2(x− 1)− 2(y − 1) + 4(z − 1) = 0.

2

The vectors ru and rv are helpful for computing the area of a smoothsurface S. For simplicity, we assume that S is parametrized by a functionr(u, v) with domain D, where D = [a, b]×[c, d] is a rectangle in the uv-plane.We divide [a, b] into n subintervals [ui−1, ui] of width ∆u = (b − a)/n, anddivide [c, d] into m subintervals [vj−1, vj ] of width ∆v = (d− c)/m.


Then, r approximately maps the rectangle Rij with lower left corner(ui−1, vj−1) into a parallelogram with adjacent edges defined by the vectors

r(ui, vj−1)− r(ui−1, vj−1) ≈ ru∆u

andr(ui−1, vj)− r(ui−1, vj−1) ≈ rv∆v.

The area of this parallelogram is

Aij = ‖ru × rv‖∆u∆v.

Adding all of these areas approximates the area of S, which we denote byA(S). If we let m,n→∞, we obtain

A(S) = limm,n→∞

n∑i=1

m∑j=1

Aij =

∫ ∫D‖ru × rv‖ dA.

Example (Stewart, Section 13.6, Exercise 34) We wish to find the area ofthe surface S that is the part of the plane 2x+ 5y + z = 10 that lies insidethe cylinder x2 + y2 = 9. First, we must find parametric equations for thissurface. Because x and y are restricted to the circle of radius 3 centered atthe origin, it makes sense to use polar coordinates for x and y. We thenhave the parametric equations

x = u cos v, y = u sin v, z = 10− u(2 cos v + 5 sin v),

where 0 ≤ u ≤ 3 and 0 ≤ v ≤ 2π. We then have

ru = 〈xu, yu, zu〉 = 〈cos v, sin v,−2 cos v − 5 sin v〉,

rv = 〈xv, yv, zv〉 = 〈−u sin v, u cos v, u(2 sin v − 5 cos v)〉.

We then have‖ru × rv‖ = ‖〈2u, 5u, u〉‖ = |u|

√30.

It follows that

A(S) =

∫ 3

0

∫ 2π

0u√

30 du dv = 2π√

30

∫ 3

0u du = 9π

√30.

It should be noted that it is to be expected that the direction of ru × rv isparallel to the normal vector of the plane 2x+5y+z = 10, since it is normalto the surface at every point. 2


Often, a surface is defined to be the graph of a function z = f(x, y).Such a surface can be parametrized by

x = u, y = v, z = f(u, v), (u, v) ∈ D.

It follows thatru = 〈1, 0, fu〉, rv = 〈0, 1, fv〉.

We then have rv×ru = 〈fu, fv,−1〉, which yields the equation of the tangentplane

∂f

∂u(u0, v0)(x− x0) +

∂f

∂v(u0, v0)(y − y0) = z − z0,

which, using the relations x = u and y = v, can be rewritten as

∂f

∂x(x0, y0)(x− x0) +

∂f

∂y(x0, y0)(y − y0) = z − z0.

Recall that this is the equation of the tangent plane of a surface defined byan equation of the form z = f(x, y) that had been previously defined. Itfollows that the area of such a surface is given by the double integral

A(S) =

∫ ∫D

√1 +

(df

dx

)2

+

(df

dy

)2

dA.

Example (Stewart, Section 13.6, Exercise 38) To find the area A(S) of thesurface z = 1 + 3x + 2y2 that lies above the triangle with vertices (0, 0),(0, 1) and (2, 1), we compute

∂z

∂x= 3,

∂z

∂y= 4y,

and then evaluate the double integral

A(S) =

∫ 1

0

∫ 2y

0

√1 +

(∂z

∂x

)2

+

(∂z

∂y

)2

dx dy

=

∫ 1

0

∫ 2y

0

√10 + 16y2 dx dy

=

∫ 1

02y√

10 + 16y2 dy

=1

16

∫ 26

10u1/2 du


=1

24u3/2

∣∣∣∣26

10

=1

24(263/2 − 103/2)

≈ 4.206.

2

A surface of revolution S that is obtained by revolving the curve y =f(x), a ≤ x ≤ b, around the x-axis has parametric equations

x = u, y = f(u) cos v, z = f(u) sin v,

where a ≤ u ≤ b and 0 ≤ v ≤ 2π. From these equations, we obtain

‖ru × rv‖ = |f(u)|√

1 + [f ′(u)]2,

which yields

A(S) = 2π

∫ b

a|f(u)|

√1 + [f ′(u)]2 du.

If y = f(x) is revolved around the y-axis instead, then the area is

A(S) = 2π

∫ b

a|u|√

1 + [f ′(u)]2 du,

which can be obtained by considering the case of revolving x = f−1(y)around the y-axis and proceeding with a parametrization similar to the caseof revolving around the x-axis.

3.7 Surface Integrals

3.7.1 Surface Integrals of Scalar-Valued Functions

Previously, we have learned how to integrate functions along curves. If asmooth space curve C is parameterized by a function r(t) = 〈x(t), y(t), z(t)〉,a ≤ t ≤ b, then the arc length L of C is given by the integral∫ b

a‖r′(t)‖ dt.

Similarly, the integral of a scalar-valued function f(x, y, z) along C is givenby ∫

Cf ds =

∫ b

af(x(t), y(t), z(t))‖r′(t)‖ dt.

3.7. SURFACE INTEGRALS 133

It follows that the integral of f(x, y, z) ≡ 1 along C is equal to the arc lengthof C.

We now define integrals of scalar-valued functions on surfaces in an anal-ogous manner. Recall that the area of a smooth surface S, parametrized byr(u, v) = 〈x(u, v), y(u, v), z(u, v)〉 for (u, v) ∈ D, is given by the integral

A(S) =

∫ ∫D‖ru × rv‖ du dv.

To integrate a scalar-valued function f(x, y, z) over S, we assume for sim-plicitly that D is a rectangle, and divide it into sub-rectangles {Rij} ofdimension ∆u and ∆v, as we did when we derived the formula for A(S).Then, the function r maps each sub-rectangle Rij into a surface patch Sijthat has area ∆Sij . This area is then multiplied by f(P ∗ij), where P ∗ij is anypoint on Sij .

Letting ∆u,∆v → 0, we obtain the surface integral of f over S to be∫ ∫Sf(x, y, z) dS = lim

∆u,∆v→0

n∑i=1

m∑j=1

f(P ∗ij) ∆Sij

=

∫ ∫Df(r(u, v))‖ru × rv‖ du dv,

since, in the limit as ∆u,∆v → 0, we have

∆Sij → ‖ru × rv‖∆u∆v.

Note that if f(x, y, z) ≡ 1, then the surface integral of f over S yields thearea of S, A(S).

Example (Stewart, Section 13.7, Exercise 6) Let S be the helicoid withparameterization

r(u, v) = 〈u cos v, u sin v, v〉, 0 ≤ u ≤ 1, 0 ≤ v ≤ π.

Then we have

ru = 〈cos v, sin v, 0〉, rv = 〈−u sin v, u cos v, 1〉,

which yields

‖ru × rv‖ = ‖〈sin v,− cos v, u〉‖ =√

sin2 v + cos2 v + u2 =√

1 + u2.


It follows that∫ ∫S

√1 + x2 + y2 dS =

∫ 1

0

∫ π

0

√1 + (u cos v)2 + (u sin v)2‖ru × rv‖ dv du

=

∫ 1

0

∫ π

0

√1 + u2

√1 + u2 dv du

=

∫ 1

0

∫ π

01 + u2 dv du

= π

∫ 1

01 + u2 du

= π

(u+

u3

3

)∣∣∣∣10

=4π

3.

2

The surface integral of a scalar-valued function is useful for computingthe mass and center of mass of a thin sheet. If the sheet is shaped like asurface S, and it has density ρ(x, y, z), then the mass is given by the surfaceintegral

m =

∫ ∫Sρ(x, y, z) dS,

and the center of mass is the point (x, y, z), where

x =1

m

∫ ∫Sxρ(x, y, z) dS,

y =1

m

∫ ∫Syρ(x, y, z) dS,

z =1

m

∫ ∫Szρ(x, y, z) dS.

3.7.2 Surface Integrals of Vector Fields

Let v be a vector field defined on R3 that represents the velocity field of afluid, and let ρ be the density of the fluid. Then, the rate of flow of thefluid, which is defined to be the rate of change with respect to time of theamount of fluid (mass), per unit area, is given by ρv.

To determine the total amount of fluid that is crossing S per unit oftime, called the flux across S, we divide S into several small patches Sij ,


as we did when we defined the surface integral of a scalar-valued function.Since each patch Sij is approximately planar (that is, parallel to a plane),we can approximate the flux across Sij by

(ρv · n)A(Sij),

where n is a unit vector that is normal (perpendicular) to Sij . This isbecause if θ is the angle between Sij and the direction of v, then the fluiddirected at Sij is effectively passing through a region of area A(Sij)| cos θ|.

If we sum the flux over each patch, and let the areas of the patchesapproach zero, then we obtain the total flux across S,∫ ∫

Sρ(x, y, z)v(x, y, z) · n(x, y, z) dS,

where n(x, y, z) is a continuous function that describes a unit normal vectorat each point (x, y, z) on S. For a general vector field F, we define thesurface integral of F over S by∫ ∫

SF · dS =

∫ ∫SF · n dS.

When F represents an electric field, we call the surface integral of F over Sthe electric flux of F through S. Alternatively, if F = −K∇u, where u isa function that represents temperature and K is a constant that representsthermal conductivity, then the surface integral of F over a surface S is calledthe heat flow or heat flux across S.

If S is parameterized by a function r(u, v), where (u, v) ∈ D, then

n =ru × rv‖ru × rv‖

,

and we then have∫ ∫SF · dS =

∫ ∫SF · ru × rv‖ru × rv‖

dS

=

∫ ∫DF(r(u, v)) · ru × rv

‖ru × rv‖‖ru × rv‖ dA

=

∫ ∫DF(r(u, v)) · (ru × rv) dA.

This is analogous to the definition of the line integral of a vector field overa curve C, ∫

CF · dr =

∫CF ·T ds =

∫ b



Just as the orientation of a curve was relevant to the line integral ofa vector field over a curve, the orientation of a surface is relevant to thesurface integral of a vector field. We say that a surface S is orientable, ororiented, if, at each point (x, y, z) in S, it is possible to choose a uniquevector n(x, y, z) that is normal to the tangent plane of S at (x, y, z), in sucha way that n(x, y, z) varies continuously over S. The particular choice of nis called an orientation.

An orientable surface has two orientations, or, informally, two “sides”,with normal vectors n and −n. This definition of orientability excludes theMobius strip, because for this surface, it is possible for a continuous variationof (x, y, z) to yield two distinct normal vectors at every point of the surface,that are negatives of one another. Geometrically, the Mobius strip can besaid to have only one “side”, because negating any choice of continuouslyvarying n yields the same normal vectors.

For a surface that is the graph of a function z = g(x, y), if we choose theparametrization

x = u, y = v, z = g(u, v),

then fromru = 〈1, 0, gu〉, rv = 〈0, 1, gv〉,

we obtainru × rv = 〈−gu,−gv, 1〉 = 〈−gx,−gy, 1〉

which yields

n =ru × rv‖ru × rv‖

=〈−gx,−gy, 1〉√

1 + g2x + g2

y

.

Because the z-component of this vector is positive, we call this choice of nan upward orientation of the surface, while −n is a downward orientation.

Example (Stewart, Section 13.7, Exercise 22) Let S be the part of the conez =

√x2 + y2 that lies beneath the plane z = 1, with downward orientation.

We wish to evaluate the surface integral∫ ∫SF · dS

where F = 〈x, y, z4〉.First, we must compute the unit normal vector for S. Using cylindrical

coordinates yields the parameterization

x = u cos v, y = u sin v, z = u, 0 ≤ u ≤ 1, 0 ≤ v ≤ 2π.


We then have

ru = 〈cos v, sin v, 1〉, rv = 〈−u sin v, u cos v, 0〉,

which yields

ru × rv = 〈−u cos v,−u sin v, u cos2 v + u sin2 v〉 = u〈− cos v,− sin v, 1〉.

Because we assume downward orientation, we must have the z-componentof the normal vector be negative. Therefore, ru×rv must be negated, whichyields ∫ ∫

SF · dS = −

∫ ∫DF(x(u, v), y(u, v), z(u, v)) · (ru × rv) dA,

where D is the domain of the parameters u and v, the rectangle [0, 1]×[0, 2π].Evaluating this integral, we obtain∫ ∫

SF · dS = −

∫ ∫D〈u cos v, u sin v, u4〉 · u〈− cos v,− sin v, 1〉 dA

= −∫ 2π

0

∫ 1

0(−u cos2 v − u sin2 v + u4)u du dv

=

∫ 2π

0

∫ 1

0(u2 − u5) du dv

= 2π

∫ 1

0(u2 − u5) du

= 2π

(u3

3− u6

6

)∣∣∣∣10

=π

3.

An alternative approach is to retain Cartesian coordinates, and then use theformula for the unit normal for a downward orientation of a surface that isthe graph of a function z = g(x, y),

n = − 〈−gx,−gy, 1〉√g2x + g2

y + 1=

1√2

⟨x√

x2 + y2,

y√x2 + y2

,−1

⟩.

This approach stil requires a conversion to polar coordinates to integrateover the unit disk in the xy-plane. 2


For a closed surface S, which is the boundary of a solid region E, wedefine the positive orientaion of S to be the choice of n that consistentlypoint outward from E, while the inward-pointing normals define the negativeorientation.

Example (Stewart, Section 13.7, Exercise 26) To evaluate the surface inte-gral ∫ ∫

SF · dS

where F(x, y, z) = 〈y, z − y, x〉 and S is the surface of the tetrahedron withvertices (0, 0, 0), (1, 0, 0), (0, 1, 0), and (0, 0, 1), we must evaluate surfaceintegrals over each of the four faces of the tetrahedron separately. We assumepositive (outward) orientation.

For the first side, S1, with vertices (0, 0, 0), (1, 0, 0) and (0, 0, 1), we firstparameterize the side using

x = u, y = 0, z = v, 0 ≤ u ≤ 1, 0 ≤ v ≤ 1− u.

Then, from

ru = 〈1, 0, 0〉, rv = 〈0, 0, 1〉,

we obtain

ru × rv = 〈0,−1, 0〉.

This vector is pointing outside the tetrahedron, so it is the outward normalvector that we wish to use. Therefore, the surface integral of F over S1 is∫ ∫

S1

F · dS =

∫ 1

0

∫ 1−u

0〈0, v − 0, u〉 · 〈0,−1, 0〉 dv du

= −∫ 1

0

∫ 1−u

0v dv du

= −∫ 1

0

v2

2

∣∣∣∣1−u0

du

= −1

2

∫ 1

0(1− u)2 du

=1

2

(1− u)3

3

∣∣∣∣10

= −1

6.


For the second side, S2, with vertices (0, 0, 0), (0, 1, 0) and (0, 0, 1), weparameterize using

x = 0, y = u, z = v, 0 ≤ u ≤ 1, 0 ≤ v ≤ 1− u.

Then, from

ru = 〈0, 1, 0〉, rv = 〈0, 0, 1〉,

we obtain

ru × rv = 〈1, 0, 0〉.

This vector is pointing inside the tetrahedron, so we must negate it to obtainthe outward normal vector. Therefore, the surface integral of F over S2 is∫ ∫

S2

F · dS =

∫ 1

0

∫ 1−u

0〈u, v − u, 0〉 · 〈−1, 0, 0〉 dv du

= −∫ 1

0

∫ 1−u

0u dv du

=

∫ 1

0u(u− 1) du

=

(u3

3− u2

2

)∣∣∣∣10

= −1

6.

For the base S3, with vertices (0, 0, 0), (1, 0, 0) and (0, 1, 0), we parametrizeusing

x = u, y = v, z = 0, 0 ≤ u ≤ 1, 0 ≤ v ≤ 1− u.

Then, from

ru = 〈1, 0, 0〉, rv = 〈0, 1, 0〉,

we obtain

ru × rv = 〈0, 0, 1〉.

This vector is pointing inside the tetrahedron, so we must negate it to obtainthe outward normal vector. Therefore, the surface integral of F over S3 is∫ ∫

S3

F · dS =

∫ 1

0

∫ 1−u

0〈v, 0− v, u〉 · 〈0, 0,−1〉 dv du

= −∫ 1

0

∫ 1−u

0u dv du


=

∫ 1

0u(u− 1) du

=

(u3

3− u2

2

)∣∣∣∣10

= −1

6.

Finally, for the “top” face S4, with vertices (1, 0, 0), (0, 1, 0) and (0, 0, 1),we parametrize using

x = u, y = v, z = 1− u− v, 0 ≤ u ≤ 1, 0 ≤ v ≤ 1− u,

since the equation of the plane containing this face is x+y+z−1 = 0. Thiscan be determined by using the three vertices to obtain two vectors withinthe plane, and then computing their cross product to obtain the plane’snormal vector.

Then, from

ru = 〈1, 0,−1〉, rv = 〈0, 1,−1〉,

we obtain

ru × rv = 〈1, 1, 1〉.

This vector is pointing outside the tetrahedron, so it is the outward normalvector that we wish to use. Therefore, the surface integral of F over S4 is∫ ∫

S4

F · dS =

∫ 1

0

∫ 1−u

0〈v, 1− u− 2v, u〉 · 〈1, 1, 1〉 dv du

=

∫ 1

0

∫ 1−u

01− v dv du

=

∫ 1

0

(v − v2

2

)∣∣∣∣1−u0

du

=

∫ 1

01− u− (1− u)2

2du

=

∫ 1

0

1

2− 1

2u2 du

=

(u

2− u3

6

)∣∣∣∣10

=1

3.

3.8. STOKES’ THEOREM 141

Adding the four integrals together yields∫ ∫SF · dS = −1

6− 1

6− 1

6+

1

3= −1

6.

2

3.8 Stokes’ Theorem

Let C be a simple, closed, positively oriented, piecewise smooth plane curve,and let D be the region that it encloses. According to one of the forms ofGreen’s Theorem, for a vector field F with continuous first partial derivativeson D, we have ∫

CF · dr =

∫ ∫D

(curlF) · k dA,

where k = 〈0, 0, 1〉.By noting that k is normal to the region D when it is embedded in 3-

D space, we can generalize this form of Green’s Theorem to more generalsurfaces that are enclosed by simple, closed, piecewise smooth, positivelyoriented space curves. Let S be an oriented, piecewise smooth surface thatis enclosed by a such a curve C. If we divide S into several small patchesSij , then these patches are approximately planar. We can apply Green’sTheorem, approximately, to each patch by rotating it in space so that itsunit normal vector is k, and using the fact that rotating two vectors u andv in space does not change the value of u · v.

Most of the line integrals along the boundary curves of each path cancelwith one another due to the positive orientation of all such boundary curves,and we are left with the line integral over C, the boundary of S. If we takethe limit as the size of the patches approches zero, we then obtain∫

CF · dr =

∫ ∫S

curlF · dS =

∫ ∫S

curlF · n dS,

where n is the unit normal vector of S. This result is known as Stokes’Theorem.

Stokes’ Theorem can be used either to evaluate an surface integral or anintegral over the curve that encloses it, whichever is easier.

Example (Stewart, Section 13.8, Exercise 2) Let F(x, y, z) = 〈yz, xz, xy〉and let S be the part of the paraboloid z = 9− x2 − y2 that lies above theplane z = 5, with upward orientation. By Stokes’ Theorem,∫ ∫

ScurlF · dS =

∫CF · dr


where C is the boundary curve of S, which is a circle of radius 2 centered at(0, 0, 5), and parallel to the xy-plane. It can therefore be parameterized by

x = 2 cos t, y = 2 sin t, z = 5, 0 ≤ t ≤ 2π.

Its tangent vector is then

r′(t) = 〈−2 sin t, 2 cos t, 0〉.

We then have∫ ∫S

curlF · dS =

∫ 2π

0F(r(t)) · r′(t) dt

=

∫ 2π

0〈10 sin t, 10 cos t, 4 cos t sin t〉 · 〈−2 sin t, 2 cos t, 0〉 dt

=

∫ 2π

0−20 sin2 t+ 20 cos2 t dt

= 20

∫ 2π

0cos 2t dt

= 10 sin 2t|2π0= 0.

This result can also be obtained by noting that because F = ∇f , wheref(x, y, z) = xyz, it follows that curlF = 0. 2

Example (Stewart, Section 13.8, Exercise 8) We wish to evaluate the lineintegral of F(x, y, z) = 〈xy, 2z, 3y〉 over the curve C that is the intersectionof the cylinder x2 + y2 = 9 with the plane x+ z = 5.

To describe the surface S enclosed by C, we use the parameterization

x = u cos v, y = u sin v, z = 5− u cos v, 0 ≤ u ≤ 3, 0 ≤ v ≤ 2π.

Using

ru = 〈cos v, sin v,− cos v〉, rv = 〈−u sin v, u cos v, u sin v〉,

we obtainru × rv = 〈u, 0, u〉.

We then compute

curlF =

⟨∂

∂x,∂

∂y,∂

∂z

⟩× 〈xy, 2z, 3y〉 = 〈1, 0,−x〉.

3.8. STOKES’ THEOREM 143

Let D be the domain of the parameters,

D = {(u, v) | 0 ≤ u ≤ 3, 0 ≤ v ≤ 2π.

We then apply Stokes’ Theorem and obtain∫CF · dr =

∫ ∫S

curlF · dS

=

∫ ∫D

curlF(r(u, v)) · (ru × rv) dA

=

∫ 3

0

∫ 2π

0〈1, 0,−u cos v〉 · 〈u, 0, u〉 dA

=

∫ 3

0

∫ 2π

0u− u2 cos v dv du

=

∫ 3

0(uv − u2 sin v)

∣∣2π0dv du

= 2π

∫ 3

0u du du

= 2πu2

2

∣∣∣∣30

= 9π.

2

Stokes’ Theorem can also be used to provide insight into the physicalinterpretation of the curl of a vector field. Let Sa be a disk of radius acentered at a point P0, and let Ca be its boundary. Furthermore, let v be avelocity field for a fluid. Then the line integral∫

Ca

v · dr =

∫Ca

v ·T ds,

where T is the unit tangent vector of Ca, measures the tendency of the fluidto move around Ca. This is because this measure, called the circulation of varound Ca, is greatest when the fluid velocity vector is consistently parallelto the unit tangent vector. That is, the circulation around Ca is maximizedwhen the fluid follows the path of Ca.

Now, by Stokes’ Theorem,∫Ca

v · dr =

∫ ∫Sa

curlv · dS


=

∫ ∫Sa

curlv · n dS

≈ curlV(P0) · n(P0)

∫ ∫Sa

1 dS

≈ πa2curlv(P0) · n(P0).

As a → 0, and Sa collapses to the point P0, this approximation improves,and we obtain

curlv(P0) · n(P0) = lima→0

1

πa2

∫Ca

v · dr.

This shows that circulation is maximized when the axis around which thefluid is circulating, n(P0), is parallel to curlv. That is, the direction ofcurlv indicates the axis around which the greatest circulation occurs.

3.8.1 A Note About Orientation

Recall Stokes’ Theorem,∫CF · dr =

∫ ∫S

curlF · dS,

where C is a simple, closed, positively oriented, piecewise smooth curve andS is a oriented surface enclosed by C. If C is parameterized by a functionr(t), where a ≤ t ≤ b, and S is parameterized by a function g(u, v), where(u, v) ∈ D, then Stokes’ Theorem becomes∫ b

aF(r(t)) · r′(t) dt =

∫ ∫D

curlF(g(u, v)) · (gu × gv) du dv.

It is important that the parameterizations r and g have the proper ori-entation for Stokes’ Theorem to apply. This is why it is required that Chave positive orientation. It means, informally, that if one were to “walk”along C, in such a way that n, the unit normal vector of S, can be viewed,then S should always be “on the left” relative to the path traced along C.

It follows that the parameterizations of C and S must be consistent withone another, to ensure that they are oriented properly. Otherwise, one of theparameterizations must be reversed, so that the sign of the correspondingintegral is corrected. The orientation of a curve can be reversed by changingthe parameter to s = a+ b− t. The orientation of a surface can be reversedby interchanging the variables u and v.

3.9. THE DIVERGENCE THEOREM 145

3.9 The Divergence Theorem

Let F be a vector field with continuous first partial derivatives. Recall astatement of Green’s Theorem,∫

CF · n ds =

∫ ∫D

divF dA,

where n is the outward unit normal vector of D. Now, let E be a three-dimensional solid whose boundary, denoted by ∂E, is a closed surface Swith positive orientation. Then, if we consider two-dimensional slices of E,each one being parallel to the xy-plane, then each slice is a region D withpositively oriented boundary C, to which Green’s Theorem applies. If wemultiply the integrals on both sides of Green’s Theorem, as applied to eachslice, by dz, the infinitesimal “thickness” of each slice, then we obtain∫ ∫

SF · n dS =

∫ ∫ ∫E

divF dV,

or, equivalently, ∫ ∫SF · dS =

∫ ∫ ∫E∇ · F dV.

This result is known as the Gauss Divergence Theorem, or simply the Di-vergence Theorem.

As the Divergence Theorem relates the surface integral of a vector field,known as the flux of the vector field through the surface, to an integral of itsdivergence over a solid, it is quite useful for converting potentially difficultdouble integrals into triple integrals that may be much easier to evaluate,as the following example demonstrates.

Example (Stewart, Section 13.9, Exercise 6) Let S be the surface of thebox with vertices (±1,±2,±3), and let F(x, y, z) = 〈x2z3, 2xyz3, xz4〉. Tocompute the surface integral of F over S directly is quite tedious, becauseS has six faces that must be handled separately. Instead, we apply theDivergence Theorem to integrate divF over E, the interior of the box. Wethen have∫ ∫

SF · dS =

∫ ∫ ∫E

divF dV

=

∫ 1

−1

∫ 2

−2

∫ 3

−3(x2z3)x + (2xyz3)y + (xz4)z dz dy dz

=

∫ 1

−1

∫ 2

−2

∫ 3

−32xz3 + 2xz3 + 4xz3 dz dy dx


=

∫ 1

−1

∫ 2

−2

∫ 3

−38xz3 dz dy dx

= 32

∫ 1

−1x

∫ 3

−3z3 dz dx

= 32

∫ 1

−1x

[z4

4

∣∣∣∣3−3

]dx

= 0.

2

The Divergence Theorem can also be used to convert a difficult surfaceintegral into an easier one.

Example (Stewart, Section 13.9, Exercise 17) Let F(x, y, z) = 〈z2x, 13y

3 +tan z, x2z + y2〉. Let S be the top half of the sphere x2 + y2 + z2 = 1. Toevaluate the surface integral of F over S, we note that if we combine S withS1, the disk x2 + y2 ≤ 1, with downward orientation. We then obtain a newsurface S2 that is the boundary of the top half of the ball x2 + y2 + z2 ≤ 1,which we denote by E. By the Divergence Theorem,∫ ∫

SF · dS +

∫ ∫S1

F · dS =

∫ ∫S2

F · dS =

∫ ∫ ∫E

divF dV.

We parameterize S1 by

x = u sin v, y = u cos v, z = 0, 0 ≤ u ≤ 1, 0 ≤ v ≤ 2π.

This parameterization is used instead of the usual one arising from polarcoordinates, due to the downward orientation. It follows from

ru = 〈sinu, cosu, 0〉, rv = 〈u cos v,−u sin v, 0〉

thatru × rv = 〈0, 0,−u sin2 v − u cos2 v〉 = u〈0, 0,−1〉,

which points downward, as desired. From

divF(x, y, z) = (z2x)x +

(1

3y3 + tan z

)y

+ (x2z + y2)z = x2 + y2 + z2,

which suggests the use of spherical coordinates for the integral over E, weobtain∫ ∫

SF · dS =

∫ ∫ ∫E

divF dV −∫ ∫

S1

F · dS

3.9. THE DIVERGENCE THEOREM 147

=

∫ ∫ ∫E

(x2 + y2 + z2) dV −∫ 1

0

∫ 2π

0F(x(u, v), y(u, v), z(u, v)) · u〈0, 0,−1〉 dv du

=

∫ 1

0

∫ 2π

0

∫ π/2

0ρ2ρ2 sinφdφ dθ dρ+

∫ 1

0

∫ 2π

0u(u2 cos2 v) dv du

= 2π

∫ 1

0ρ4

∫ π/2

0sinφdφ dρ+

∫ 1

0u3

∫ 2π

0cos2 v dv du

= 2π

∫ 1

0ρ4[− cosφ|π/20

]dρ−

∫ 1

0u3

∫ 2π

0

1 + cos 2v

2dv du

= 2π

∫ 1

0ρ4 dρ+

∫ 1

0u3

[v

2+

sin 2v

4

]∣∣∣∣2π0

du

= 2πρ5

5

∣∣∣∣10

+ π

∫ 1

0u3 du

=2π

5+ π

u4

4

∣∣∣∣10

=2π

5+π

4

=13π

20.

2

Suppose that F is a vector field that, at any point, represents the flowrate of heat energy, which is the rate of change, with respect to time, ofthe amount of heat energy flowing through that point. By Fourier’s Law,F = −K∇T , where K is a constant called thermal conductivity, and T is afunction that indicates temperature.

Now, let E be a three-dimensional solid enclosed by a closed, positivelyoriented, surface S with outward unit normal vector n. Then, by the lawof conservation of energy, the rate of change, with respect to time, of theamount of heat energy inside E is equal to the flow rate, or flux, or heatinto E through S. That is, if ρ(x, y, z) is the density of heat energy, then

∂

∂t

∫ ∫ ∫Eρ dV =

∫ ∫SF · (−n) dS,

where we use −n because n is the outward unit normal vector, but we needto express the flux into E through S.


From the definition of F, and the fact that ρ = cρ0T , where c is thespecific heat and ρ0 is the mass density, which, for simplicity, we assume tobe constant, we have

∂

∂t

∫ ∫ ∫Ecρ0T dV =

∫ ∫SK∇T · n dS.

Next, we note that because c, ρ0, and E do not depend on time, we canwrite ∫ ∫ ∫

Ecρ0

∂T

∂tdV =

∫ ∫SK∇T · dS.

Now, we apply the Divergence Theorem, and obtain∫ ∫ ∫Ecρ0

∂T

∂tdV =

∫ ∫ ∫EK div∇T dV =

∫ ∫ ∫EK∇2T dV.

That is, ∫ ∫ ∫E

(cρ0

∂T

∂t−K∇2T

)dV = 0.

Since the solid E is arbitrary, it follows that

∂T

∂t=

K

cρ0∇2T.

This is known as the heat equation, which is one of the most importantpartial differential equations in all of applied mathematics.

3.10 Differential Forms

To date, we have learned the following theorems concerning the evalution ofintegrals of derivatives:

• The Fundamental Theorem of Calculus:∫ b

af ′(x) dx = f(b)− f(a)

• The Fundamental Theorem of Line Integrals:∫ b

a∇f(r(t)) · r′(t) dt = f(r(b))− f(r(a))

3.10. DIFFERENTIAL FORMS 149

• Green’s Theorem:∫ ∫D

(Qx − Py) dA =

∫CP dx+Qdy

• Stokes’ Theorem: ∫ ∫S

curlF · dS =

∫CF · dr

• Gauss’ Divergence Theorem:∫ ∫ ∫E

divF dV =

∫ ∫SF · dS

All of these theorems relate the integral of the derivative or gradient of afunction, or partial derivatives of components of a vector field, over a higher-dimensional region to the integral or sum of the function or vector field over alower-dimensional region. Now, we will see how the notation of differentialforms can be used to combine all of these theorems into one. It is thisnotation, as opposed to vectors and operations such as the divergence andcurl, that allows the Fundamental Theorem of Calculus to be generalized tofunctions of several variables.

A differential form is an expression consisting of a scalar-valued functionf : K ⊆ Rn → R and zero or more infinitesimals of the form dx1, dx2, . . . , dxn,where x1, x2, . . . , xn are the independent variables of f . The order of a dif-ferential form is defined to be the number of infinitesimals that it includes.

For simplicity, we set n = 3 of three variables. With that in mind, a0-form, or a differential form of order zero, is simply a scalar-valued functionf(x, y, z). A 1-form is a function f(x, y, z) together with one of the expres-sions dx, dy or dz. A 2-form is a function f(x, y, z) together with a pair ofdistinct infinitesimals, which can be either dx dy, dy dz or dz dx. Finally, a3-form is an expression of the form f(x, y, z) dx dy dz.

Example The function f(x, y, z) = x2y + y3z is a 0-form on R3, whilef dx = (x2y + y3z) dx and f dy = (x2y + y3z) dy are both examples of a1-form on R3. 2

Example Let f(x, y, z) = 1/(x2 + y2 + z2). Then f dx dy is a 2-form onR3 − {(0, 0, 0}, while f dx dy dz is a 3-form on the same domain. 2

Forms of the same order can be added and scaled by functions, as the fol-lowing examples show.


Example Let f(x, y, z) = ex−y sin z and let g(x, y, z) = (x2 + y2 + z2)3/2.Then f , g and f + g are all 0-forms on R3, and

f + g = ex−y sin z + (x2 + y2 + z2)3/2.

That is, addition of 0-forms is identical to addition of functions.

If we define ω1 = f dx and ω2 = g dy, then ω1 and ω2 are both 1-formson R3, and so is ω = ω1 + ω2, where

ω = f dx+ g dy = ex−y sin z dx+ (x2 + y2 + z2)3/2 dy.

Furthermore, if h(x, y, z) = xy2z3, and

η1 = f dx dy, η2 = g dz dx

are 2-forms on R3, then

η = hη1 + η2 = xy2z3ex−y sin z dx dy + (x2 + y2 + z3)3/2 dz dx

is also a 2-form on R3. 2

Example Let f(x, y, z) = cosx, g(x, y, z) = ey and h(x, y, z) = xyz2. Then,ν1 = f dx dy dz and ν2 = g dx dy dz are 3-forms on R3, and so is

ν = ν1 + hν2 = (cosx+ xyz2ey) dx dy dz.

2

It should be noted that like addition of functions, addition of differentialforms is both commutative, associative, and distributive. Also, there isnever any need to add forms of different order, such as adding a 0-form toa 1-form.

We now define two essential operations on differential forms. The firstis called the wedge product, a multiplication operation for differential forms.Given a k-form ω and an l-form η, where 0 ≤ k + l ≤ 3, the wedge productof ω and η, denoted by ω∧η, is a (k+ l)-form. It satisfies the following laws:

1. For each k there is a k-form 0 such that η ∧ 0 = 0 ∧ η = 0 for anyl-form η.

2. Distributitivy: If f is a 0-form, then

(fω1 + ω2) ∧ η = f(ω1 ∧ η) + (ω2 ∧ η).


3. Anticommutativity:

ω ∧ η = (−1)kl(η ∧ ω).

4. Associativity:ω1 ∧ (ω2 ∧ ω3) = (ω1 ∧ ω2) ∧ ω3

5. Homogeneity: If f is a 0-form, then

ω ∧ (fη) = (fω) ∧ η = f(ω ∧ η).

6. If dxi is a basic 1-form, then dxi ∧ dxi = 0.

7. If f is a 0-form, then f ∧ ω = fω.

Example Let ω = f dx and η = g dy be 1-forms. Then

ω ∧ η = (f dx ∧ g dy) = fg(dx ∧ dy) = fg dx dy,

by homogeneity, while

η ∧ ω = (−1)1(1)(ω ∧ η) = −fg dx dy.

On the other hand, if ν = h dy dz is a 2-form, then

ν ∧ ω = fh(dy dz ∧ dx) = fh dy dz dx = −fh dy dx dz = fh dx dy dz

by homogeneity and anticommutativity, while

ν ∧ η = fh(dy dz ∧ dy) = fh dy dz dy = −fh dy dy dz = 0.

2

Note that if any 3-form on R3 is multiplied by a k-form, where k > 0, thenthe result is zero, because there cannot be distinct basic 1-forms in the wedgeproduct of such forms.

Example Let ω = x dx− y dy, and η = z dy dz − x dz dx. Then

ω ∧ η = (x dx− y dy) ∧ (z dy dz − x dz dx)

= (x dx ∧ z dy dz)− (y dy ∧ z dy dz)− (x dx ∧ x dz dz) +

(y dy ∧ x dz dx)

= xz dx dy dz − yz dy dy dz − x2 dx dz dx+ xy dy dz dx

= xz dx dy dz − yz dy dy dz + x2 dx dx dz + xy dy dz dx

= xz dx dy dz − 0− 0− xy dy dx dz= (xz + xy) dx dy dz.


2

The second operation is differentiation. Given a k-form ω, where k < 3,the derivative of ω, denoted by dω, is a (k+1)-form. It satisfies the followinglaws:

1. If f is a 0-form, then

df = fx dx+ fy dy + fz dz

2. Linearity: If ω1 and ω2 are k-forms, then

d(ω1 + ω2) = dω1 + dω2

3. Product Rule: If ω is a k-form and η is an l-form, then

d(ω ∧ η) = (dω ∧ η) + (−1)k(ω ∧ dη)

4. The second derivative of a form is zero; that is, for any k-form ω,d(dω) = 0.

We now illustrate the use of these differentiation rules.

Example Let ω = x2y3z4 dx dy be a 2-form. Then, by Linearity and theProduct Rule,

dω = [d(x2y3z4) ∧ dx dy] + (−1)0[x2y3z4 ∧ d(dx dy)]

=[(

(x2y3z4)x dx+ (x2y3z4)y dy + (x2y3z4)z dz)∧ dx dy

]+[

x2y3z4 ∧ {(d(dx) ∧ dy) + (−1)1(dx ∧ d(dy)}]

=[(

2xy3z4 dx+ 3x2y2z4 dy + 4x2y3z3 dz)∧ dx dy

]+[

x2y3z4 ∧ {(0 ∧ dy)− (dx ∧ 0)}]

= 2xy3z4 dx dx dy + 3x2y2z4 dy dx dy + 4x2y3z3 dz dx dy + 0

= −4x2y3z3 dx dz dy

= 4x2y3z3 dx dy dz.

In general, differentiating a k-form ω, when k > 0, only requires differentiat-ing the coefficient function with respect to the variables that are not amongany basic 1-forms that are included in ω. In this example, since ω = f dx dy,we obtain dω = fz dz dx dy = fz dx dy dz. 2

We now consider the kind of differential forms that appear in the theo-rems of vector calculus.


• Let ω = f(x, y, z) be a 0-form. Then, by the first law of differentiation,

dω = ∇f · 〈dx, dy, dz〉.

If C is a smooth curve with parameterization r(t) = 〈x(t), y(t), z(t)〉,a ≤ t ≤ b, then∫ b

a∇f(r(t)) · r′(t) dt =

∫ b

a∇f(r(t)) · 〈x′(t), y′(t), z′(t)〉 dt

=

∫ b

adω(r(t))

=

∫Cdω.

It follows from the Fundamental Theorem of Line Integrals that∫Cdω = ω(r(b))− ω(r(a)).

The boundary of C, ∂C, consists of its initial point A and termi-nal point B. If we define the “integral” of a 0-form ω over this 0-dimensional region by ∫

∂Cω = ω(B)− ω(A),

which makes sense considering that, intuitively, the numbers 1 and −1serve as an appropriate “outward unit normal vector” at the terminaland initial points, respectively, then we have∫

Cdω =

∫∂Cω.

• Let ω = P (x, y) dx+Q(x, y) dy be a 1-form. Then

dω = d[P (x, y) dx] + d[Q(x, y) dy]

= dP (x, y) ∧ dx− P (x, y) ∧ d(dx) + dQ(x, y) ∧ dy −Q(x, y) ∧ d(dy)

= (Px dx+ Py dy) ∧ dx− 0 + (Qx dx+Qy dy) ∧ dy − 0

= Px dx dx+ Py dy dx+Qx dx dy +Qy dy dy

= (Qy − Px) dx dy.

It follows from Green’s Theorem that∫Cω =

∫ ∫Ddω.


• If we proceed similarly with a 1-form

ω = F · 〈dx, dy, dz〉 = P (x, y, z) dx+Q(x, y, z) dy +R(x, y, z) dz,

then we obtain

dω = curlF · 〈dy dz, dz dx, dx dy〉= (Ry −Qz) dy dz + (Pz −Rx) dz dx+ (Qy − Px) dx dy.

Let S be a smooth surface parameterized by

r(u, v) = 〈x(u, v), y(u, v), z(u, v)〉, (u, v) ∈ D.

Then the (unnormalized) normal vector ru × rv is given by

ru × rv = 〈xu, yu, zu〉 × 〈xv, yv, zv〉= 〈yuzv − zuyv, zuxv − xuzv, xuyv − yuxv〉

=

⟨∂(y, z)

∂(u, v),∂(z, x)

∂(u, v),∂(x, y)

∂(u, v)

⟩.

We then have∫ ∫S

curlF · dS =

∫ ∫S

curlF · n dS

=

∫ ∫D

curlF(r(u, v)) · (ru × rv) du dv

=

∫ ∫D

{[Ry(r(u, v))−Qz(r(u, v))]

∂(y, z)

∂(u, v)+

[Pz(r(u, v))−Rx(r(u, v))]∂(z, x)

∂(u, v)+

[Qx(r(u, v))− Py(r(u, v))]∂(x, y)

∂(u, v)

}du dv

=

∫ ∫S

(Ry −Qz) dy dz + (Pz −Rx) dz dx+

(Qy − Px) dx dy

=

∫ ∫Sdω.

If C is the boundary curve of S, and C is parameterized by r(t) =〈x(t), y(t), z(t)〉, a ≤ t ≤ b, then∫

CF · dr =

∫ b

aF(r(t)) · r′(t) dt


=

∫ b

a〈P (r(t)), Q(r(t)), R(r(t)) · 〈x′(t), y′(t), z′(t)〉 dt

=

∫ b

aω(r(t)) dt

=

∫Cω.

It follows from Stokes’ Theorem that∫Cω =

∫ ∫Sdω.

• Let F = 〈P,Q,R〉. Let ω be the 2-form

ω = P dy dz +Qdz dx+Rdxdy.

Then

dω = dP dy dz + dQdz dx+ dRdx dy

= [Px dx+ Py dy + Pz dz] dy dz + [Qx dx+Qy dy +Qz dz] dz dx+

[Rx dx+Ry dy +Rz dz] dx dy

= Px dx dy dz +Qy dy dz dx+Rz dz dx dy

= Px dx dy dz −Qy dy dx dz −Rz dx dz dy= Px dx dy dz +Qy dx dy dz +Rz dx dy dz

= divF dx dy dz.

Let E be a solid enclosed by a smooth surface S with positive orien-tation, and let S be parameterized by

r(u, v) = 〈x(u, v), y(u, v), z(u, v)〉, (u, v) ∈ D.

We then have∫ ∫SF · dS =

∫ ∫SF · n dS

=

∫ ∫DF(r(u, v)) · (ru × rv) du dv

=

∫ ∫D〈P (r(u, v)), Q(r(u, v)), R(r(u, v))〉 ·

⟨∂(y, z)

∂(u, v),∂(z, x)

∂(u, v),∂(x, y)

∂(u, v)

⟩du dv

=

∫ ∫DP (r(u, v))

∂(y, z)

∂(u, v)+Q(r(u, v))

∂(z, x)

∂(u, v)+


R(r(u, v))∂(x, y)

∂(u, v)du dv

=

∫ ∫SP dy dz +Qdz dx+Rdxdy

=

∫ ∫Sω.

It follows from the Divergence Theorem that∫ ∫Sω =

∫ ∫ ∫Edω.

Putting all of these results together, we obtain the following combinedtheorem, that is known as the General Stokes’ Theorem:

If M is an oriented k-manifold with boundary ∂M , and ω is a(k − 1)-form defined on an open set containing M , then∫

∂Mω =

∫Mdω.

The importance of this unified theorem is that, unlike the previously statedtheorems of vector calculus, this theorem, through the language of differen-tial forms, can be generalized to functions of any number of variables. Thisis because operations on differential forms are not defined in terms of otheroperations, such as the cross product, that are limited to three variables.For example, given a 3-form ω = f(x, y, z, w) dx dy dw, its integral over a3-dimensional, closed, positively oriented hypersurface S embedded in R4 isequal to the integral of dω over the 4-dimensional solid E that is enclosed byS, where dω is computed using the previously stated rules for differentiationand multiplication of differential forms.

mat 280: multivariable calculus - math.usm.edu · mat 280: multivariable calculus james v. lambers...

Documents