Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Linear Maps, the Total Derivative and the Chain Rule, Schemes and Mind Maps of Linear Algebra

notion of linear maps and derivative of function in linear map

Typology: Schemes and Mind Maps

2020/2021

Uploaded on 06/11/2021

humaira
humaira 🇨🇫

4.8

(126)

274 documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
LINEAR MAPS, THE TOTAL DERIVATIVE AND THE CHAIN RULE
ROBERT LIPSHITZ
Abstract. We will discuss the notion of linear maps and introduce the total derivative of
a function f:RnRmas a linear map. We will then discuss composition of linear maps
and the chain rule for derivatives.
Contents
1. Maps RnRm1
2. Linear maps 5
3. Matrices 8
4. The total derivative and the Jacobian matrix 10
4.1. Review of the derivative as linear approximation 10
4.2. The total derivative of a function RnRm12
4.3. The Jacobian matrix 14
5. Composition of linear maps and matrix multiplication 15
5.1. Matrix arithmetic 18
6. The chain rule for total derivatives 19
6.1. Comparison with the treatment in Stewart’s Calculus 22
1. Maps RnRm
So far in this course, we have talked about:
Functions RR; you worked with these a lot in Calculus 1.
Parametric curves, i.e., functions RR2and RR3, and
Functions of several variables, i.e., functions R2Rand R3R.
(We’ve probably also seen some examples of maps RnRfor some n > 3, but we haven’t
worked with these so much.)
What we haven’t talked about much are functions R3R3, say, or R3R2, or R2R2.
Definition 1.1. A function f:R3R3is something which takes as input a vector in R3
and gives as output a vector in R3. Similarly, a map R3R2takes as input a vector in R3
and gives as output a vector in R2; and so on.
Example 1.2.Rotation by π/6 counter-clockwise around the z-axis is a function R3R3:
it takes as input a vector in R3and gives as output a vector in R3. Let’s let R(~v) denote
rotation of ~v by π/6 counter-clockwise around the z-axis. Then, doing some trigonometry
Copyright 2009 by Robert Lipshitz. Released under the Creative Commons Attribution-Noncommercial-
Share Alike 3.0 Unported License.
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download Linear Maps, the Total Derivative and the Chain Rule and more Schemes and Mind Maps Linear Algebra in PDF only on Docsity!

LINEAR MAPS, THE TOTAL DERIVATIVE AND THE CHAIN RULE

ROBERT LIPSHITZ

Abstract. We will discuss the notion of linear maps and introduce the total derivative of a function f : Rn^ → Rm^ as a linear map. We will then discuss composition of linear maps and the chain rule for derivatives.

Contents

  1. Maps Rn^ → Rm^1
  2. Linear maps 5
  3. Matrices 8
  4. The total derivative and the Jacobian matrix 10 4.1. Review of the derivative as linear approximation 10 4.2. The total derivative of a function Rn^ → Rm^12 4.3. The Jacobian matrix 14
  5. Composition of linear maps and matrix multiplication 15 5.1. Matrix arithmetic 18
  6. The chain rule for total derivatives 19 6.1. Comparison with the treatment in Stewart’s Calculus 22
    1. Maps Rn^ → Rm So far in this course, we have talked about:
    • Functions R → R; you worked with these a lot in Calculus 1.
    • Parametric curves, i.e., functions R → R^2 and R → R^3 , and
    • Functions of several variables, i.e., functions R^2 → R and R^3 → R.

(We’ve probably also seen some examples of maps Rn^ → R for some n > 3, but we haven’t worked with these so much.) What we haven’t talked about much are functions R^3 → R^3 , say, or R^3 → R^2 , or R^2 → R^2.

Definition 1.1. A function f : R^3 → R^3 is something which takes as input a vector in R^3 and gives as output a vector in R^3. Similarly, a map R^3 → R^2 takes as input a vector in R^3 and gives as output a vector in R^2 ; and so on.

Example 1.2. Rotation by π/6 counter-clockwise around the z-axis is a function R^3 → R^3 : it takes as input a vector in R^3 and gives as output a vector in R^3. Let’s let R(~v) denote rotation of ~v by π/6 counter-clockwise around the z-axis. Then, doing some trigonometry

Copyright 2009 by Robert Lipshitz. Released under the Creative Commons Attribution-Noncommercial- Share Alike 3.0 Unported License. 1

2 ROBERT LIPSHITZ

(see Figure 1 for the two-dimensional analogue), we have for example

R(1, 0 , 0) = (cos(π/6), sin(π/6), 0) = (

R(0, 1 , 0) = (− sin(π/6), cos(π/6), 0)(− 1 / 2 ,

R(0, 0 , 1) = (0, 0 , 1)

R(2, 3 , 1) = (2 cos(π/6) − 3 sin(π/6), 2 sin(π/6) + 3 cos(π/6), 1).

Example 1.3. Translation by the vector (1, 2 , 3) is a map R^3 → R^3 : it takes as input any vector ~v and gives as output ~v + (1, 2 , 3). Let T (~v) denote translation of ~v by (1, 2 , 3). Then, for example

T (1, 0 , 0) = (2, 2 , 3) T (0, 1 , 0) = (1, 3 , 3) T (0, 0 , 1) = (1, 2 , 4) T (2, 3 , 1) = (3, 5 , 4).

Example 1.4. Orthogonal projection from R^3 to the xy-plane can be viewed as a function P : R^3 → R^2. Then, for example

P (1, 0 , 0) = (1, 0) P (0, 1 , 0) = (0, 1) P (0, 0 , 1) = (0, 0) P (2, 3 , 1) = (2, 3).

Example 1.5. Rotation by an angle θ around the origin gives a map Rθ : R^2 → R^2. For example,

Rθ(1, 0) = (cos(θ), sin(θ)) Rθ(0, 1) = (− sin(θ), cos(θ)) Rθ(2, 3) = (2 cos(θ) − 3 sin(θ), 2 sin(θ) + 3 cos(θ)).

(See Figure 1 for the trigonometry leading to the computation of Rθ(2, 3).)

Just like a function R → R^3 corresponds to three functions R → R, a function R^3 → R^3 corresponds to three functions R^3 → R. That is, any function F : R^3 → R^3 is given by

F (x, y, z) = (u(x, y, z), v(x, y, z), w(x, y, z)).

Example 1.6. The function R from Example 1.2 has the form

R(x, y, z) = (x cos(π/6) − y sin(π/6), x sin(π/6) + y cos(π/6), z).

Example 1.7. The function T from Example 1.3 has the form

T (x, y, z) = (x + 1, y + 2, z + 3).

Example 1.8. There is a function which takes a point written in cylindrical coordinates and rewrites it in rectangular coordinates, which we can view as a map R^3 → R^3 given by

F (r, θ, z) = (r cos(θ), r sin(θ), z). Similarly, there is a function which takes a point written in spherical coordinates and rewrites it in rectangular coordinates; viewed as a function R^3 → R^3 it is given by

G(ρ, θ, φ) = (ρ cos(θ) sin(φ), ρ sin(θ) sin(φ), ρ cos(φ)).

4 ROBERT LIPSHITZ

Example 1.10. For Rθ the rotation map from Example 1.5, Rπ/ 2 ◦ Rπ/ 6 means you first rotate by π/6 and then by another π/2. So, this is just a rotation by π/2 + π/6 = 2π/3, i.e.,

Rπ/ 2 ◦ Rπ/ 6 = R 2 π/ 3.

Example 1.11. If you apply the rotation map R from Example 1.2 and then the projection map P from Example 1.4 the result is the same as applying P first and then rotating in the plane by π/6. That is, P ◦ R = Rπ/ 6 ◦ P. In terms of coordinates, composition of maps corresponds to substituting variables, as the following example illustrates.

Example 1.12. Let R be the function from Example 1.2 and T the function from Example 1.3; that is,

R(x, y, z) = (x cos(π/6) − y sin(π/6), x sin(π/6) + y cos(π/6), z) T (x, y, z) = (x + 1, y + 2, z + 3).

Then,

T ◦ R(x, y, z) = T (x cos(π/6) − y sin(π/6), x sin(π/6) + y cos(π/6), z) = (x cos(π/6) − y sin(π/6) + 1, x sin(π/6) + y cos(π/6) + 2, z + 3).

Similarly,

R ◦ T (x, y, z) = R(x + 1, y + 2, z + 3) = ((x + 1) cos(π/6) − (y + 2) sin(π/6), (x + 1) sin(π/6) + (y + 2) cos(π/6), z + 3). Notice that composition of maps is not commutative: R ◦ T is not the same as T ◦ R. Geometrically, this says that translating and then rotating is not the same as rotating and then translating; if you think about it, that makes sense. As another example of composition as substitution of variables, let’s do Example 1.11 in terms of coordinates:

Example 1.13. Let’s compute P ◦ R in coordinates. We had

P (x, y, z) = (x, y) R(x, y, z) = (x cos(π/6) − y sin(π/6), x sin(π/6) + y cos(π/6), z).

So,

P ◦ R(x, y, z) = P (x cos(π/6) − y sin(π/6), x sin(π/6) + y cos(π/6), z) = (x cos(π/6) − y sin(π/6), x sin(π/6) + y cos(π/6)).

On the other hand,

Rπ/ 6 (x, y) = (x cos(π/6) − y sin(π/6), x sin(π/6) + y cos(π/6)).

So,

Rπ/ 6 ◦ P (x, y, z) = Rπ/ 6 (x, y) = (x cos(π/6) − y sin(π/6), x sin(π/6) + y cos(π/6)).

So, again we see that P ◦ R = Rπ/ 6 ◦ P.

As one last example, we’ll do Example 1.10 again in terms of coordinates:

THE TOTAL DERIVATIVE 5

Example 1.14. We have

Rπ/ 2 (x, y) = (x cos(π/2) − y sin(π/2), x sin(π/2) + y cos(π/2)) = (−y, x) Rπ/ 6 (x, y) = (x cos(π/6) − y sin(π/6), x sin(π/6) + y cos(π/6)) = (x

3 / 2 − y/ 2 , x/2 + y

R 2 π/ 3 (x, y) = (x cos(2π/3) − y sin(2π/3), x sin(2π/3) + y cos(2π/3)) = (−x/ 2 − y

3 / 2 , x

3 / 2 − y/2).

Composing Rπ/ 2 and Rπ/ 6 we get

Rπ/ 2 ◦ Rπ/ 6 (x, y) = Rπ/ 2 (x

3 / 2 − y/ 2 , x/2 + y

3 /2) = (−x/ 2 − y

3 / 2 , x

3 / 2 − y/2).

So, again we see that Rπ/ 2 ◦ Rπ/ 6 = R 2 π/ 3.

Exercise 1.15. Let T be the map from Example 1.3. What does the map T ◦ T mean geometrically? What is it in coordinates?

Exercise 1.16. Let F be the “write cylindrical coordinates in rectangular coordinates” map from Example 1.8. Let H(r, θ, z) = (r, θ + π/ 6 , z). Compute F ◦ H in coordinates.

Exercise 1.17. Let Rθ be the rotation map from Example 1.5. Compose Rθ and Rφ by substituting, like in Example 1.14. We also know that Rφ ◦ Rθ = Rθ+φ. Deduce the formulas for sin(θ + φ) and cos(θ + φ).

  1. Linear maps We’ll start with a special case:

Definition 2.1. A map F : R^2 → R^2 is linear if F can be written in the form

F (x, y) = (ax + by, cx + dy)

for some real numbers a, b, c, d.

That is, a linear map is one given by homogeneous linear equations: xnew = axold + byold ynew = cxold + dyold.

(The word homogeneous means that there are no constant terms.)

Example 2.2. The map f (x, y) = (2x + 3y, x + y) is a linear map R^2 → R^2.

Example 2.3. The map Rπ/ 2 : R^2 → R^2 which is rotation by the angle π/2 around the origin is a linear map: it is given in coordinates as

Rπ/ 2 (x, y) = (−y, x) = (0x + (−1)y, 1 x + 0y).

Example 2.4. More generally, the map Rθ : R^2 → R^2 which is rotation by the angle θ around the origin is a linear map: it is given in coordinates as

Rθ(x, y) = (cos(θ)x − sin(θ)y, sin(θ)x + cos(θ)y).

(Notice that cos(θ) and sin(θ) are constants, so it’s fine for them to be coefficients in a linear map.)

More generally:

THE TOTAL DERIVATIVE 7

Lemma 2.10. Let F : Rn^ → Rm^ be a linear map. Then for any ~v, ~w in Rn^ and λ in R,

  • F (~v + w~) = F (~v) + F ( w~) and
  • F (λ~v) = λF (~v).

Proof. Again, to keep notation simple, we will just prove the lemma for maps R^2 → R^2. Suppose F (x, y) = (ax + by, cx + dy). Let ~v = (r, s) and w~ = (t, u). Then

F (~v + w~) = F (r + t, s + u) = (a(r + t) + b(s + u), c(r + t) + d(s + u)) = (ar + bs, cr + ds) + (at + bu, ct + du) = F (~v) + F ( w~) F (λ~v) = F (λr, λs) = (aλr + bλs, cλr + dλs) = λ(ar + bs, cr + ds) = λF (~v),

as desired. 

Example 2.11. The map F from Example 1.8 is not linear. The form we wrote it in is certainly not that of Definition 2.5. But this doesn’t necessarily mean F can not be written in the form of Definition 2.5. To see that F can not be written in that form, we use Lemma 2.10. If we take ~v = (1, π/ 2 , 0) and λ = 2 then

F (λ~v) = F (2, π, 0) = (2 cos(π), 2 sin(π), 0) = (− 2 , 0 , 0) λF (~v) = 2F (1, π/ 2 , 0) = 2(cos(π/2), sin(π/2), 0) = 2(0, 1 , 0) = (0, 2 , 0).

So, F (λ~v) 6 = λF (~v) so F is not linear.

Example 2.12. The function f (x) = x^2 : R → R is not linear: taking x = 5 and λ = 2 we have f (λx) = 100 but λf (x) = 50.

Example 2.13. The function f (x) = |x| : R → R is not linear: taking ~v = (1) and w~ = (−1), f (~v + w~) = f (0) = 0 but f (~v) + f ( w~) = 1 + 1 = 2.

Remark 2.14. The converse to Lemma 2.10 is also true; the proof is slightly (but not much) harder. We outline the argument in Challenge Problem 2.17, below.

Exercise 2.15. Which of the following maps are linear? Justify your answers.

(1) f : R^2 → R^2 defined by f (x, y) = (− 5 x − 7 y, x + y). (2) f : R^3 → R^2 defined by f (x, y, z) = (xy + yz, xz + yz). (3) f : R^2 → R defined by f (x, y) = |x| − |y|. (4) f : R^3 → R^2 defined by f (x, y, z) = (sin(π/7)x + cos(π/7)y, −e^3 z). (5) f : R^2 → R^2 defined by

f (x, y) =

(x

(^3) +xy (^2) −yx (^2) −y 3 x^2 +y^2 , x^ +^ y)^ if (x, y)^6 = (0,^ 0) (0, 0) if (x, y) = (0, 0).

Exercise 2.16. Use the formula from Lemma 2.8 to compute Rπ/ 6 ◦ Rπ/ 6.

8 ROBERT LIPSHITZ

Challenge Problem 2.17. In this problem we will prove the converse of Lemma 2.10, in the m = n = 2 case. Suppose that f : R^2 → R^2 satisfies f (~v + w~) = f (~v) + f ( w~) and f (λ~v) = λf (~v) for any ~v, w~ in R^2 and λ in R. Let (a, c) = f (1, 0) and (b, d) = f (0, 1). Prove that for any (x, y), f (x, y) = (ax + by, cx + dy).

  1. Matrices Let’s look again at the form of a linear map F from Definition 2.1: F (x, y) = (ax + by, cx + dy).

If we want to specify such a map, all we need to do is specify a, b, c and d: the x and the y are just placeholders. So, we could record this data by writing:

[F ] =

a b c d

This is a matrix, i.e., a rectangular array of numbers. The main point to keep in mind is that a matrix is just a shorthand for a linear map.^1 If we are not just interested in linear maps R^2 → R^2 , our matrices will not be 2 × 2. For a linear map F : Rn^ → Rm^ given by

F (x 1 ,... , xn) = (a 1 , 1 x 1 + a 1 , 2 x 2 + · · · + a 1 ,nxn, a 2 , 1 x 1 + a 2 , 2 x 2 + · · · + a 2 ,nxn,... , am, 1 x 1 + am, 2 x 2 + · · · + am,nxn),

as in Definition 2.5, the corresponding matrix is

[F ] =

a 1 , 1 a 1 , 2 · · · a 1 ,n a 2 , 1 a 2 , 2 · · · a 2 ,n .. .

am, 1 am, 2 · · · am,n

Notice that this matrix has m rows and n columns. Again: the matrix for a linear map F : Rn^ → Rm^ is an m × n matrix. (Notice also that I am writing [F ] to denote the matrix for F .)

Example 3.1. Suppose P is the projection map from Example 1.4. Then the matrix for P is

[P ] =

Notice that [P ] is a 2 × 3 matrix, and P maps R^3 → R^2.

Example 3.2. If R is the rotation map from Example 1.2 then the matrix for R is

[R] =

cos(π/6) − sin(π/6) 0 sin(π/6) cos(π/6) 0 0 0 1

Notice that [R] is a 3 × 3 matrix and R maps R^3 → R^3.

(^1) There are also other uses of matrices, but their use as stand-ins for linear maps is their most important

use, and the only way we will use them.

10 ROBERT LIPSHITZ

Figure 3. Multiplying a matrix by a vector. Two examples are shown, with arrows indicating your fingers during the computation. In the second example, only the nonzero terms are marked.

(3) [F ] =

 (^) , ~v =

(4) [F ] =

 (^) , ~v =

(5) [F ] =

, ~v =

  1. The total derivative and the Jacobian matrix The total derivative of a map F : Rn^ → Rm^ at a point ~p in Rn^ is the best linear approxi- mation to F near ~p. We will make this precise in Section 4.2. First, though, we review the various kinds of linear approximations we have seen already.

4.1. Review of the derivative as linear approximation. Suppose that f : R → R is a differentiable function. Then near any point a ∈ R we can approximate f (x) using f (a) and

THE TOTAL DERIVATIVE 11

f ′(a):

(4.1) f (a + h) ≈ f (a) + f ′(a)h.

This is a good approximation in the sense that if we let

(h) = f (a + h) − f (a) − f ′(a)h

denote the error in the approximation (4.1) then (h) goes to zero faster than linearly:

lim h→ 0

(h)/h = 0.

Similarly, suppose F : R → Rn^ is a parametric curve in Rn. If F is differentiable at some a in R then

F (a + h) ≈ F (a) + F ′(a)h.

This is now a vector equation; the + denotes vector addition and F ′(a)h is scalar mul- tiplication of the vector F ′(a) by the real number h. The sense in which this is a good approximation is the same as before: the error goes to 0 faster than linearly. In symbols:

lim h→ 0

F (a + h) − F (a) − F ′(a)h h

(Again, this is a vector equation now.) As a final special case, consider a function F : Rn^ → R. To keep notation simple, let’s consider the case n = 2. Then for any ~a = (a 1 , a 2 ) in R^2 ,

(4.2) F (~a + (h 1 , h 2 )) ≈ F (~a) +

∂F

∂x

h 1 +

∂F

∂y

h 2.

Here, the ≈ means that

lim (h 1 ,h 2 )→(0,0)

F (~a − (h 1 , h 2 )) − F (~a) − ∂F∂x h 1 − ∂F∂y h 2 √ h^21 + h^22

Notice that we could rewrite Equation (4.2) as

F (~a + ~h) ≈ F (~a) + (∇F ) · ~h.

The dot product (∇F ) · ~h term looks a lot like matrix multiplication. Indeed, if we define D~aF to be the linear map with matrix

[D~aF ] =

(∂F

∂x

∂F ∂y

then Equation (4.2) becomes

F (~a + ~h) ≈ F (~a) + (D~aF )(~h).

Thus inspired...

THE TOTAL DERIVATIVE 13

Example 4.8. For F (x, y) = (x + 2y, 3 x + 4y) and any ~a ∈ R^2 ,

DF (~a)(h 1 , h 2 ) = (h 1 + 2h 2 , 3 h 1 + 4h 2 ).

To see this, write ~a = (a 1 , a 2 ). Then,

lim (h 1 ,h 2 )→(0,0)

F (a 1 + h 1 , a 2 + h 2 ) − F (a 1 , a 2 ) − (h 1 + 2h 2 , 3 h 1 + 4h 2 ) √ h^21 + h^22 = lim (h 1 ,h 2 )→(0,0)

[

(a 1 + h 1 + 2a 2 + 2h 2 , 3 a 1 + 3h 1 + 4a 2 + 4h 2 ) − (a 1 + 2a 2 , 3 a 1 + 4a 2 )

− (h 1 + 2h 2 , 3 h 1 + 4h 2 )

]

h^21 + h^22

= lim (h 1 ,h 2 )→(0,0)

h^21 + h^22 = (0, 0),

as desired. More generally, if F is a linear map then for any vector ~a, DF (~a) = F. This makes sense: if F is linear then the best linear approximation to F is F itself.

We have been calling DF (~a) the derivative of F at ~a, but a priori there might be more than one. As you might suspect, this is not the case:

Lemma 4.9. Let F : Rn^ → Rm^ be a map and let ~a be a point in Rn. Suppose that L and M are both linear maps such that

lim ~h→~ 0

F (~a + ~h) − F (~a) − L(~h) ‖~h‖

= ~ 0 and

lim ~h→~ 0

F (~a + ~h) − F (~a) − M (~h) ‖~h‖

Then L = M.

Proof. If L and M are different linear maps then there is some vector ~k so that L(~k) 6 = M (~k). Let’s consider taking the limit ~h → 0 along the line ~h = t~k. We have

0 = lim t→ 0

F (~a + t~k) − F (~a) − L(t~k) ‖t~k‖

= lim t→ 0

F (~a + t~k) − F (~a) − M (t~k) ‖t~k‖

L(t~k) − M (t~k) ‖t~k‖

= 0 + lim t→ 0

tL(~k) − tM (~k) |t|‖~k‖

But the limit on the right hand side does not exist: if t > 0 then we get (L(~k) − M (~k))/‖~k‖

while if t < 0 we get −(L(~k) − M (~k))/‖~k‖.

So, our supposition that L(~k) 6 = M (~k) must have been false. 

14 ROBERT LIPSHITZ

4.3. The Jacobian matrix. Our definition of the total derivative should seem like a useful one, and it generalizes the cases we already had, but some questions remain. Chief among them:

(1) How do you tell if a function Rn^ → Rm^ is differentiable? (And, are many of the functions one encounters differentiable?) (2) How do you compute the total derivative of a function Rn^ → Rm? The next theorem answers both questions; we will state it and then unpack it in some examples.

Theorem 1. Suppose that F : Rn^ → Rm. Write F = (f 1 ,... , fm), where fi : Rn^ → R. If for all i and j, ∂fi/∂xj is (defined and) continuous near ~a then F is differentiable at ~a, and the matrix for DF (~a) is given by

[DF (~a)] =

∂f 1 ∂x 1

∂f 1 ∂x 2 · · ·^

∂f 1 ∂f^ ∂xn 2 ∂x 1

∂f 2 ∂x 2 · · ·^

∂f 2 ∂xn .. .

∂fm ∂x 1

∂fm ∂x 2 · · ·^

∂fm ∂xn

To avoid being sidetracked, we wont prove this theorem; its proof is similar to the proof of Stewart’s Theorem 8 in Section 14.4 (proved in Appendix F). (It is fairly easy to see that if F is differentiable at ~a then [DF (~a)] has the specified form. Slightly harder is to show that if all of the partial derivatives of the components of F are continuously differentiable then F is differentiable.) The matrix [DF (~a)] is called the total derivative matrix or Jacobian matrix of F at ~a.

Example 4.10. For the function F (x, y) = (x + y^2 , x^3 + 5y) from Example 4.7, the Jacobian matrix at (1, 1) is

[DF ((1, 1))] =

(Why?) This is, indeed, the matrix associated to the linear map from Example 4.7.

Example 4.11. Suppose F is the “turn cylindrical coordinates into rectangular coordinates” map from Example 1.8. Then

[DF (5, π/ 3 , 0)] =

cos(θ) −r sin(θ) 0 sin(θ) r cos(θ) 0 0 0 1

r=5,θ=π/ 3 ,z=

√^3 /^2

Example 4.12. Problem. Suppose F : R^2 → R^2 is a differentiable map, F (1, 1) = (5, 8) and

[DF (1, 1)] =

Estimate F (1. 1 , 1 .2). Solution. Since F is differentiable,

F ((1, 1) + (. 1 , .2)) ≈ F (1, 1) + DF (1, 1)(. 1 , .2).

16 ROBERT LIPSHITZ

Figure 4. Multiplying matrices. Arrows indicate what your fingers should do when computing three of the four entries.

the matrix product AB of A and B is the m × p matrix whose (i, j) entry^2 is

ci,j = ai, 1 b 1 ,j + ai, 2 b 2 ,j + · · · + ai,nbn,j. That is, to find the (i, j) entry of AB you run your left index finger across the ith^ row of A and your right index finger down the jth^ column of B. Multiply the pairs of numbers your fingers touch simultaneously and add the results. See Figure 4.

Example 5.1. A few example of matrix products: ( 2 3 0 4

(By the way, I did in fact use my fingers when computing these products.)

Notice that the product of a matrix with a column vector as in Section 3 is a special case of multiplication of two matrices.

(^2) i.e., the entry in the ith (^) row and jth (^) column

THE TOTAL DERIVATIVE 17

Theorem 2. If F : Rp^ → Rn^ and G : Rn^ → Rm^ are linear maps then [G ◦ F ] = [G][F ].

Proof. We will only do the case m = n = p = 2; the general case is similar (but with indices). Write

F (x, y) = (ax + by, cx + dy) G(x, y) = (ex + f y, gx + hy),

so on the one hand we have

G ◦ F (x, y) = ((ae + cf )x + (be + df )y, (ag + ch)x + (bg + dh)y),

[G ◦ F ] =

ae + cf be + df ag + ch bg + dh

and on the other hand we have

[G] =

e f g h

[F ] =

a b c d

[G][F ] =

ae + cf be + df ag + ch bg + dh

So, indeed, [G][F ] = [G ◦ F ], as desired. 

Corollary 5.2. The matrix product is associative. That is, if A is an m × n matrix, B is an n × p matrix and C is a p × r matrix then (AB)C = A(BC).

Proof. The matrix product corresponds to composition of functions, and composition of functions is (obviously) associative. (We could also verify this directly, by writing out all of the sums—but they become painfully complicated.) 

Notice that we have only defined the matrix product AB when the width of A is the same as the height of B. If the width of A is not the same as the height of B then AB is simply not defined. This corresponds to the fact that given maps F : Rq^ → Rp^ and G : Rn^ → Rm, G ◦ F only makes sense if p = n.

Example 5.3. The product (^) ( 1 2 3 4 5 6

is not defined: the first matrix has 3 columns but the second has only 2 rows. This corre- sponds to the fact that you can’t compose a map R^3 → R^2 with another map R^3 → R^2.

If you use your fingers, it’s obvious when a matrix product isn’t defined: one finger runs out of entries before the other.

Example 5.4. Even when the products in both orders are defined, matrix multiplication is typically non-commutative. For example: ( 1 2 3 4

THE TOTAL DERIVATIVE 19

We might drop the subscripts n × n or m × n from In×n or 0m×n when the dimensions are either obvious or irrelevant. The properties of matrix arithmetic are summarized as follows.

Lemma 5.7. Suppose A, B, C and D are n×m, n×m, m×l and p×n matrices, respectively. Then:

(1) (Addition is commutative:) A + B = B + A. (2) (Additive identity:) (^0) n×m + A = A. (3) (Multiplicative identity:) In×nA = A = AIm×m. (4) (Multiplicative zero:) A (^0) m×k = 0n×k. (^0) p×nA = 0p×m. (5) (Distributivity:) (A + B)C = AC + BC and D(A + B) = DA + DB. (All of the properties are easy to check directly, and also follow from corresponding prop- erties of the arithmetic of linear maps.)

Exercise 5.8. Square the matrix A =

, i.e., multiply A by itself. Explain geomet-

rically why the answer makes sense. (Hint: what linear map does A correspond to? What happens if you do that map twice?)

Exercise 5.9. Square the matrix B =

. Does anything surprise you about the

answer?

Challenge Problem 5.10. Find a matrix A so that A 6 = I but A^3 = I. (Hint: think geometrically.)

  1. The chain rule for total derivatives With the terminology we have developed, the chain rule is very succinct: D(G ◦ F ) = (DG) ◦ (DF ).

More precisely:

Theorem 3. Let F : Rp^ → Rn^ and G : Rn^ → Rm^ be maps so that F is differentiable at ~a and G is differentiable at F (~a). Then

(6.1) D(G ◦ F )(~a) = DG(F (~a)) ◦ DF (~a).

In particular, the Jacobian matrices satisfy

[D(G ◦ F )(~a)] = [DG(F (~a))][DF (~a)]. We will prove the theorem a little later. First, some examples and remarks. Notice that Equation (6.1) looks a lot like the chain rule for functions of one variable, (g ◦ f )′(x) = f ′(x)g′(f (x)). Of course, the one-variable chain rule is a (very) special case of Theorem 3, so this isn’t a total surprise. But it’s nice that we have phrased things, and chosen notation, so that the similarity manifests itself.

Example 6.2. Question. Let F (x, y) = (x + y^2 , x^3 + 5y) and G(x, y) = (y^2 , x^2 ). Compute [D(G ◦ F )] at (1, 1). Solution. We already computed in Example 4.10 that

[DF (1, 1)] =

20 ROBERT LIPSHITZ

Computing...

F (1, 1) = (2, 6)

[DG] =

0 2 y 2 x 0

[DG(2, 6)] =

So, by the chain rule,

[D(G ◦ F )(1, 1)] =

As a sanity check, notice that the dimensions are right: G ◦ F is a map R^2 → R^2 , so its Jacobian should be a 2 × 2 matrix. Note also that we could compute [D(G ◦ F )] by composing G and F first and then differ- entiating. We should get the same answer; this would be a good check to do.

Example 6.3. Question. Let F (x, y) = (x + y^2 , x^3 + 5y) and G(x, y) = 2xy. Compute [D(G ◦ F )(1, 1)]. Solution. As in the previous example, F (1, 1) = (2, 6) and

[DF (1, 1)] =

Computing,

[DG] =

2 y 2 x

[DG(2, 6)] =

So, by the chain rule,

[D(G ◦ F )(1, 1)] =

Example 6.4. Question. An ant is crawling on a surface. The temperature at a point (x, y) is T (x, y) = 70 + 5 sin(x) cos(y). The ant’s position at time t is γ(t) = (t cos(πt), t sin(πt)). How fast is the temperature under the ant changing at t = 5? Solution. We are interested in the derivative of T ◦ γ(t) at t = 5. Computing, γ(5) = (5 cos(5π), 5 sin(5π)) = (− 5 , 0)

[Dγ] =

cos(πt) − πt sin(πt) sin(πt) + πt cos(πt)

[Dγ(5)] =

− 5 π

[DT ] =

5 cos(x) cos(y) −5 sin(x) sin(y)

[DT (− 5 , 0)] =

5 cos(5) 0

So, by the chain rule,

[D(T ◦ γ)] =

5 cos(5) 0

− 5 π

= (−5 cos(5)).

So, the answer is −5 cos(5) (degrees per second, or whatever).