Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding the Chain Rule in Multivariable Calculus: Derivative of Composite Functions, Lecture notes of Pre-Calculus

An in-depth explanation of the chain rule in multivariable calculus. The author, nikhil srivastava, uses the concept of linear approximations and matrices to derive the chain rule for functions from r2 to r and r2 to r2. The document also discusses the relationship between the gradient of a function and its jacobian matrix.

What you will learn

  • What is the relationship between the gradient and Jacobian matrix of a function?
  • How is the chain rule derived using linear approximations and matrices?
  • What is the chain rule in multivariable calculus?

Typology: Lecture notes

2021/2022

Uploaded on 09/12/2022

tarquin
tarquin 🇺🇸

4.3

(15)

260 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
The Multivariable Chain Rule
Nikhil Srivastava
February 11, 2015
The chain rule is a simple consequence of the fact that differentiation produces the linear
approximation to a function at a point, and that the derivative is the coefficient appearing
in this linear approximation.
Let’s see this for the single variable case first. It is especially transparent using o()
notation, where once again f(x) = o(g(x)) means that
lim
x0
f(x)
g(x)
= 0.
Suppose we are interested in computing the derivative of (fg)(x) = f(g(x)) at x, where
fand gare both differentiable functions from Rto R. Since gis differentiable, we have (by
the definition of differentiation as a limit):
g(x+ x) = g(x) + g0(x)∆x+o(∆x)
for a number g0(x) which we call the derivative of gat x. In words, this says that gis
well-approximated by its linear approximation in a neighborhood of x. Similarly, we have
f(y+ y) = f(y) + f0(y)∆y+o(∆y).
Letting y=g(x) and y=g0(x)∆x+o(∆x), we now find that
(fg)(x+ x) = f(g(x+ x))
=f(g(x) + g0(x)∆x+o(∆x))
=f(g(x)) + f0(g(x)) (g0(x)∆x+o(∆x)) + o(∆y)
=f(g(x)) + f0(g(x)) ·g0(x)∆x+o(∆x) + o(∆y) since f0(g(x)) ·o(∆x) = o(∆x).
Thus, we have
lim
x0
(fg)(x+ x)(fg)(x)
x= lim
x0
f0(g(x)) ·g0(x) + o(∆x) + o(∆y)
x=f0(g(x))·g0(x),
establishing the chain rule.
1
pf3

Partial preview of the text

Download Understanding the Chain Rule in Multivariable Calculus: Derivative of Composite Functions and more Lecture notes Pre-Calculus in PDF only on Docsity!

The Multivariable Chain Rule

Nikhil Srivastava

February 11, 2015

The chain rule is a simple consequence of the fact that differentiation produces the linear approximation to a function at a point, and that the derivative is the coefficient appearing in this linear approximation. Let’s see this for the single variable case first. It is especially transparent using o() notation, where once again f (x) = o(g(x)) means that

lim x→ 0

f (x) g(x)

Suppose we are interested in computing the derivative of (f ◦ g)(x) = f (g(x)) at x, where f and g are both differentiable functions from R to R. Since g is differentiable, we have (by the definition of differentiation as a limit):

g(x + ∆x) = g(x) + g′(x)∆x + o(∆x)

for a number g′(x) which we call the derivative of g at x. In words, this says that g is well-approximated by its linear approximation in a neighborhood of x. Similarly, we have

f (y + ∆y) = f (y) + f ′(y)∆y + o(∆y).

Letting y = g(x) and ∆y = g′(x)∆x + o(∆x), we now find that

(f ◦ g)(x + ∆x) = f (g(x + ∆x))

= f (g(x) + g′(x)∆x + o(∆x)) = f (g(x)) + f ′(g(x)) (g′(x)∆x + o(∆x)) + o(∆y) = f (g(x)) + f ′(g(x)) · g′(x)∆x + o(∆x) + o(∆y) since f ′(g(x)) · o(∆x) = o(∆x).

Thus, we have

lim ∆x→ 0

(f ◦ g)(x + ∆x) − (f ◦ g)(x) ∆x

= lim ∆x→ 0

f ′(g(x)) · g′(x) + o(∆x) + o(∆y) ∆x

= f ′(g(x))·g′(x),

establishing the chain rule.

A very similar thing happens in the multivariable case. Suppose f : R^2 → R and g : R^2 → R^2 are differentiable. To parallel the notation used in class, let z = f (x, y) and (x, y) = g(s, t). Since both functions are differentiable, they must have linear approximations:

f (x + ∆x, y + ∆y) = f ((x, y) + (∆x, ∆y)) ≈ f (x, y) + Lf (∆x, ∆y) (∗), g(s + ∆s, t + ∆t) = g((s, t) + (∆s, ∆t)) ≈ g(s, t) + Lg(∆s, ∆t) (∗∗)

where Lf : R^2 → R and LG : R^2 → R^2 are linear functions, and I have used ≈ to indicate equality up to o(∆x) terms^1. But we know that all linear functions are implemented by matrices so there must be a 1 × 2 matrix Df such that

Lf (∆x, ∆y) = Df

[

∆x ∆y

]

In fact, we know exactly what this matrix is (by comparing coefficients):

Df =

[

∂f ∂x

∂f ∂y

]

so that we have the explicit formula

Lf (∆x, ∆y) =

[

∂f ∂x

∂f ∂y

] [∆x ∆y

]

∂f ∂x

∆x +

∂f ∂y

∆y,

which is the same as what is given by the total differential df. Repeating this process for Lg, we get that for the 2 × 2 matrix

Dg =

[∂x ∂s

∂x ∂y ∂t ∂s

∂y ∂t

]

the linear approximation of g at (s, t) is given by

Lg(∆s, ∆t) = Dg

[

∆s ∆t

]

[∂x ∂s

∂x ∂y ∂t ∂s

∂y ∂t

] [

∆s ∆t

]

[∂x ∂s ∆s^ +^

∂x ∂y ∂t^ ∆t ∂s ∆s^ +^

∂y ∂t ∆t

]

Now for the punch line: just as in the univariate case, we write:

(f ◦ g)((s, t) + (∆s, ∆t)) = f (g((s, t) + (∆s, ∆t)))

≈ f

g(s, t) + Dg

[

∆s ∆t

])

by (∗∗)

≈ f (g(s, t)) + Df Dg

[

∆s ∆t

]

by (∗), treating Dg

[

∆s ∆t

]

as

[

∆x ∆y

]

(^1) There is a subtlety about uniform convergence vs pointwise convergence here, but for the purposes of

this course you can ignore it, and what is written here is good enough.