常用的矩阵求导公式推导过程的倒数公式?

〇. 前言1、看本文之前请务必按照顺序先看这两篇文章:下文以"本质篇",“基础篇”指代上面这两篇文章。2、本文介绍向量变元的实值标量函数、矩阵变元的实值标量函数中进阶的矩阵求导的技巧:矩阵的迹 \mathbb{tr}(\pmb{A}) 与一阶实矩阵微分 \mathrm{d}\pmb{X}。(文中的推导过程会使用到矩阵变元的实矩阵函数,但矩阵变元的实矩阵函数的求导本文不会涉及)3、本文章和前两篇文章的区别是什么,分别在什么时候使用:答:我们知道,在高等数学中,导数的原始定义其实是在求极限,但我们在实际求导的过程中,不会真正去使用定义去求的,而是使用各种我们已知的比如幂函数、指数函数等的求导公式与乘积法则、复合法则等。矩阵求导也是类似的,我们在实际求导过程中,不会真正使用本质篇与基础篇的定义法去求,而是使用本文中的各种法则去求。4、本文使用的符号与本质篇、基础篇相同。5、看懂本文需要了解本质篇、基础篇所提及的知识,以及了解本科阶段线性代数中行列式、伴随矩阵、逆矩阵的知识,以及了解本科阶段高等数学中的微分与全微分的知识,无需任何其他知识。6、本文前两节 一. 矩阵的迹 二. 微分与全微分是矩阵求导的前置知识,如果你已经很熟悉了,可以直接跳到 三. 矩阵的微分。(不过还是建议看一遍,加深印象)7、有一个矩阵求导的网站,大家可以验证自己算的结果是否正确。一. 矩阵的迹[1]1、定义n \times n 的方阵 \pmb{A}_{n \times n} 的主对角线元素之和就叫矩阵 \pmb{A} 的迹(trace),记作 \mathbb{tr}(\pmb{A}) ,即:\pmb{A}_{n \times n}= \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \\ \end{bmatrix}_{n \times n}
\\\\ \pmb{A} 的迹为:\mathbb{tr}(\pmb{A})=a_{11} + a_{22} + \cdots + a_{nn} = \sum_{i=1}^n{a_{ii}} \\\\ \tag{1} 注意:只有方阵才有迹。2、一些性质(很重要,下文需要用到,建议熟记)2.1 标量的迹对于一个标量 x ,可以看成是 1 \times 1 的矩阵,它的迹就是它自己。x=\mathbb{tr}(x) \\\\ \tag{2} 2.2 线性法则相加再求迹等于求迹再相加,标量提外面\mathbb{tr}(c_1\pmb{A}+c_2\pmb{B}) = c_1\mathbb{tr}(\pmb{A})+c_2\mathbb{tr}(\pmb{B}) \\\\ \tag{3}其中, c_1,c_2 为标量。证明:\begin{align} \mathbb{tr}(c_1\pmb{A}+c_2\pmb{B})
&= \mathbb{tr} \begin{bmatrix} c_1a_{11}+c_2b_{11} & c_1a_{12}+c_2b_{12} & \cdots & c_1a_{1n}+c_2b_{1n} \\ c_1a_{21}+c_2b_{21} & c_1a_{22}+c_2b_{22} & \cdots & c_1a_{2n}+c_2b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ c_1a_{n1}+c_2b_{n1} & c_1a_{n2}+c_2b_{n2} & \cdots & c_1a_{nn}+c_2b_{nn} \\ \end{bmatrix}
\\\\
&= (c_1a_{11}+c_2b_{11})+(c_1a_{22}+c_2b_{22})+\cdots + (c_1a_{nn}+c_2b_{nn})
\\\\
&= c_1(a_{11}+a_{22}+\cdots+a_{nn}) + c_2(b_{11}+b_{22}+\cdots+b_{nn})
\\\\
&= c_1\mathbb{tr}(\pmb{A})+c_2\mathbb{tr}(\pmb{B})
\end{align} \\\\ \tag{4} 证毕。2.3 转置转置的迹等于原矩阵的迹\mathbb{tr}(\pmb{A})=\mathbb{tr}(\pmb{A}^T) \\\\ \tag{5} 证明:因为转置不会改变主对角线的元素,故成立。证毕。2.4 乘积的迹的本质对于两个阶数都是 m \times n 的矩阵\pmb{A}_{m\times n},\pmb{B}_{m\times n}, 其中一个矩阵乘以(左乘右乘都可以)另一个矩阵的转置的迹,本质是 \pmb{A}_{m\times n},\pmb{B}_{m\times n} 两个矩阵对应位置的元素相乘并相加,可以理解为向量的点积在矩阵上的推广,即:\begin{align} \mathbb{tr}(\pmb{A}\pmb{B}^T)
&= a_{11}b_{11}+a_{12}b_{12}+\cdots+a_{1n}b_{1n}\\ &+ a_{21}b_{21}+a_{22}b_{22}+\cdots+a_{2n}b_{2n}\\ &+ \cdots \\ &+ a_{m1}b_{m1}+a_{m2}b_{m2}+\cdots+a_{mn}b_{mn} \end{align} \\\\ \tag{6} 证明:\begin{align} \mathbb{tr}(\pmb{A}\pmb{B}^T) &=\mathbb{tr}( \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \vdots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{bmatrix} \begin{bmatrix} b_{11} & b_{21} & \cdots & b_{m1} \\ b_{12} & b_{22} & \cdots & b_{m2} \\ \vdots & \vdots & \vdots & \vdots \\ b_{1n} & b_{2n} & \cdots & b_{mn} \\ \end{bmatrix} )
\\\\
&= \mathbb{tr} \begin{bmatrix} a_{11}b_{11}+a_{12}b_{12}+\cdots+a_{1n}b_{1n} & 不用管 & \cdots & 不用管 \\ 不用管 & a_{21}b_{21}+a_{22}b_{22}+\cdots+a_{2n}b_{2n} & \cdots & 不用管 \\ \vdots & \vdots & \ddots & \vdots \\ 不用管 & 不用管 & \cdots & a_{m1}b_{m1}+a_{m2}b_{m2}+\cdots+a_{mn}b_{mn} \\ \end{bmatrix}_{m \times m} \\\\ &= a_{11}b_{11}+a_{12}b_{12}+\cdots+a_{1n}b_{1n}\\ &+ a_{21}b_{21}+a_{22}b_{22}+\cdots+a_{2n}b_{2n}\\ &+ \cdots \\ &+ a_{m1}b_{m1}+a_{m2}b_{m2}+\cdots+a_{mn}b_{mn} \end{align} \\\\ \tag{7} 证毕。2.5 交换律矩阵乘积位置互换,迹不变\mathbb{tr}(\pmb{A}\pmb{B})= \mathbb{tr}(\pmb{B}\pmb{A}) \\\\ \tag{8} 其中, \pmb{A}_{m \times n},\pmb{B}_{n \times m} 。证明:把 \pmb{B}_{n \times m} 看做是 (\pmb{B}^T)_{m \times n} 的转置。由乘积的迹的本质,即 (6) 式可知,无论乘积怎么交换顺序, \pmb{A}_{m \times n} 与 (\pmb{B}^T)_{m \times n} 对应位置的元素相乘并相加,永远是不变的。证毕。2.6 更多矩阵的交换律\mathbb{tr}(\pmb{A}\pmb{B}\pmb{C})=\mathbb{tr}(\pmb{C}\pmb{A}\pmb{B})=\mathbb{tr}(\pmb{B}\pmb{C}\pmb{A}) \\\\ \tag{9} 其中, \pmb{A}_{m \times n},\pmb{B}_{n \times p},\pmb{C}_{p \times m} 。证明:把两个矩阵的乘积看做一个矩阵,和另外的一个矩阵应用交换律即可。证毕。2.7 熟练使用\mathbb{tr}(\pmb{A}\pmb{B}^T) = \mathbb{tr}(\pmb{B}^T\pmb{A}) = \mathbb{tr}(\pmb{A}^T\pmb{B}) = \mathbb{tr}(\pmb{B}\pmb{A}^T)
\\\\ \tag{10} 其中, \pmb{A}_{m \times n},\pmb{B}_{m \times n} 。证明:第一个和第二个是交换律,第二个和三个是转置,第三个和第四个是交换律。证毕。二. 微分与全微分我们先来复习一下本科阶段所学的高等数学中的微分与全微分。1、一元函数的微分1.1 普通函数的微分[2]设 y=f(x) , y 可导,则其微分为:\mathbb{d}y=\mathbb{d}f(x)=f'(x)\mathbb{d}x
\\\ \tag{11}1.2 复合函数的微分[3]设 y=f(u),u=g(x) ,均可导,则 y 的微分为:\mathbb{d}y=\mathbb{d}f(u)=f'(u)\mathbb{d}u=f'(u)\mathbb{d}g(x)=f'(u)g'(x)\mathbb{d}x \\\\ \tag{12} 乍一看很复杂,其实举个例子就很简单了:设 y=\sin(2x+1),u=2x+1 ,则 y 的微分为:\begin{align} \mathbb{d}y&=\mathbb{d}(\sin{u})=\cos{u}\mathbb{d}u=\cos(2x+1)\mathbb{d}(2x+1) \\\\ &=\cos(2x+1) \cdot 2 \mathbb{d}x=2\cos(2x+1) \mathbb{d}x \end{align} \\\\ \tag{13} 2、多元函数的全微分2.1 普通函数的全微分[4]设 z=f(x,y) , z 可微,则其全微分为:\mathbb{d}z=\frac{\partial z}{\partial x}\mathbb{d}x+\frac{\partial z}{\partial y}\mathbb{d}y \\\\ \tag{14}2.2 复合函数的全微分设 z=f(u),u=\varphi(x,y) , z 可导, u 可微,则其全微分为:\begin{align} \mathbb{d}z& =\mathbb{d}f(u)=f'(u)\mathbb{d}u=f'(u)(\frac{\partial u}{\partial x}\mathbb{d}x+\frac{\partial u}{\partial y}\mathbb{d}y) \\\\
& =
f'(u)\frac{\partial u}{\partial x}\mathbb{d}x+f'(u)\frac{\partial u}{\partial
y}\mathbb{d}y
\end{align}
\\\ \tag{15} 举个例子:设 z=\sin(2x+y^2),u=2x+y^2 ,则 z 的全微分为:\begin{align} \mathbb{d}z&=\mathbb{d}(\sin u)=\cos u\mathbb{d}u=\cos(2x+y^2)\mathbb{d}(2x+y^2) \\\\ &=\cos(2x+y^2)(2\mathbb{d}x+2y\mathbb{d}y) \\\\ &= 2\cos(2x+y^2)\mathbb{d}x+2y\cos(2x+y^2)\mathbb{d}y
\end{align}
\\\
3、微分/全微分的法则[5]3.1 常数的微分\mathbb{d}c=0 \\\\ \tag{16_1} 其中,c 为常数。3.2 线性法则相加再微分等于微分再相加,常数提外面\mathbb{d}(c_1u+c_2v) = c_1\mathbb{d}u+c_2\mathbb{d}v \\\\ \tag{16_2}其中, 一元函数 u=u(x),v=v(x) 或多元函数 u=u(x,y),v=v(x,y) , c_1,c_2 为常数。3.3 乘积法则前微后不微 + 前不微后微\mathbb{d}(uv)=\mathbb{d}(u)v+u\mathbb{d}(v) \\\\ \tag{16_3}其中, 一元函数 u=u(x),v=v(x) 或多元函数 u=u(x,y),v=v(x,y) 。3.4 商法则(上微下不微 减 上不微下微)除以(下的平方)\mathbb{d}(\frac{u}{v})=\frac{1}{v^2}(\mathbb{d}(u)v-u\mathbb{d}(v) ) \\\\ \tag{16_4} 其中, 一元函数 v=v(x) \neq0,u=u(x) 或多元函数 v=v(x,y) \neq 0, u=u(x,y) 。三. 矩阵的微分1、向量变元的实值标量函数[6]f(\pmb{x}),\pmb{x}=[x_1,x_2,\cdots,x_n]^T \\\\它其实就是多元函数,设其可微,则它的全微分,即 (14) 式:\begin{align} \mathbb{d}f(\pmb{x}) &=\frac{\partial f}{\partial x_1}\mathbb{d}x_1+\frac{\partial f}{\partial x_2}\mathbb{d}x_2 + \cdots+\frac{\partial f}{\partial x_n}\mathbb{d}x_n\\\\
&= (\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n})
\begin{bmatrix} \mathbb{d}x_1 \\ \mathbb{d}x_2\\ \vdots \\ \mathbb{d}x_n \end{bmatrix}
\end{align}
\\\ \tag{17} 结果是标量,由 (2) 式可知, (17) 式可以写成迹的形式,即:\begin{align} \mathbb{d}f(\pmb{x})
&= (\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n})
\begin{bmatrix} \mathbb{d}x_1 \\ \mathbb{d}x_2\\ \vdots \\ \mathbb{d}x_n \end{bmatrix} \\\\
&=\mathbb{tr}((\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n})
\begin{bmatrix} \mathbb{d}x_1 \\ \mathbb{d}x_2\\ \vdots \\ \mathbb{d}x_n \end{bmatrix})
\end{align}
\\\ \tag{18} 2、矩阵变元的实值标量函数[7]f(\pmb{X}),\pmb{X}_{m\times n}=(x_{ij})_{i=1,j=1}^{m,n} \\\\它也是多元函数,设其可微,则它的全微分,仍是 (14) 式:\begin{align} \mathbb{d}f(\pmb{X}) &=\frac{\partial f}{\partial x_{11}}\mathbb{d}x_{11}+\frac{\partial f}{\partial x_{12}}\mathbb{d}x_{12} + \cdots+\frac{\partial f}{\partial x_{1n}}\mathbb{d}x_{1n}\\ &+\frac{\partial f}{\partial x_{21}}\mathbb{d}x_{21}+\frac{\partial f}{\partial x_{22}}\mathbb{d}x_{22} + \cdots+\frac{\partial f}{\partial x_{2n}}\mathbb{d}x_{2n}\\ &+\cdots\\ &+\frac{\partial f}{\partial x_{m1}}\mathbb{d}x_{m1}+\frac{\partial f}{\partial x_{m2}}\mathbb{d}x_{m2} + \cdots+\frac{\partial f}{\partial x_{mn}}\mathbb{d}x_{mn}
\end{align}
\\\ \tag{19}我们从这个结果中发现,它其实就是矩阵 (\frac{\partial f}{\partial x_{ij}})_{i=1,j=1}^{m,n} 与矩阵 (\mathbb{d}x_{ij})_{i=1,j=1}^{m,n} 对应位置的元素相乘并相加,由 (6) 式可知, (19) 式也可以写成迹的形式,即:\begin{align} \mathbb{d}f(\pmb{X}) &=\frac{\partial f}{\partial x_{11}}\mathbb{d}x_{11}+\frac{\partial f}{\partial x_{12}}\mathbb{d}x_{12} + \cdots+\frac{\partial f}{\partial x_{1n}}\mathbb{d}x_{1n}\\ &+\frac{\partial f}{\partial x_{21}}\mathbb{d}x_{21}+\frac{\partial f}{\partial x_{22}}\mathbb{d}x_{22} + \cdots+\frac{\partial f}{\partial x_{2n}}\mathbb{d}x_{2n}\\ &+\cdots\\ &+\frac{\partial f}{\partial x_{m1}}\mathbb{d}x_{m1}+\frac{\partial f}{\partial x_{m2}}\mathbb{d}x_{m2} + \cdots+\frac{\partial f}{\partial x_{mn}}\mathbb{d}x_{mn}
\\\\
&=\mathbb{tr}( \begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&\cdots&\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&\cdots&\frac{\partial f}{\partial
x_{mn}}
\end{bmatrix}_{n\times m} \begin{bmatrix} \mathbb{d}x_{11}
& \mathbb{d}x_{12} & \cdots
& \mathbb{d}x_{1n} \\ \mathbb{d}x_{21}
& \mathbb{d}x_{22} & \cdots
& \mathbb{d}x_{2n} \\ \vdots&\vdots&\vdots&\vdots\\ \mathbb{d}x_{m1}
& \mathbb{d}x_{m2} & \cdots
& \mathbb{d}x_{mn}
\end{bmatrix}_{m \times n} ) \end{align}
\\\ \tag{20} 3、矩阵变元的实矩阵函数[8]\pmb{F}(\pmb{X}),\pmb{F}_{p\times q}=(f_{ij})_{i=1,j=1}^{p,q},\pmb{X}_{m \times n}=(x_{ij})_{i=1,j=1}^{m,n} \\\\由本质篇_一._3_3.3 可知,矩阵变元的实矩阵函数,它的每个元素其实就是一个矩阵变元的实值标量函数 f_{ij}(\pmb{X}) 。我们定义:设 f_{ij}(\pmb{X}) 可微,则矩阵变元的实矩阵函数的矩阵微分,就是对每个位置的元素 f_{ij}(\pmb{X}) 求全微分,排列布局不变,即:\begin{align} \mathbb{d}\pmb{F}_{p \times q}(\pmb{X}) &= \begin{bmatrix} \mathbb{d}f_{11}(\pmb{X})& \mathbb{d}f_{12}(\pmb{X}) & \cdots & \mathbb{d}f_{1q}(\pmb{X}) \\ \mathbb{d}f_{21}(\pmb{X})& \mathbb{d}f_{22}(\pmb{X}) & \cdots & \mathbb{d}f_{2q}(\pmb{X}) \\ \vdots&\vdots&\vdots&\vdots \\ \mathbb{d}f_{p1}(\pmb{X})& \mathbb{d}f_{p2}(\pmb{X}) & \cdots & \mathbb{d}f_{pq}(\pmb{X})
\end{bmatrix}_{p \times q}
\end{align}
\\\ \tag{21} 3.1 四个法则(很重要,下文需要用到,建议熟记)a. 常数矩阵的矩阵微分\mathbb{d}\pmb{A}_{m \times n} = \pmb{0}_{m \times n} \\\\ \tag{22_1} 证明:\pmb{A} 的每个元素都是常数,由 (16\_1) 得,每个元素的微分是 0 。证毕。b. 线性法则相加再微分等于微分再相加,常数提外面\mathbb{d}(c_1\pmb{F}(\pmb{X})+c_2\pmb{G}(\pmb{X})) = c_1\mathbb{d}\pmb{F}(\pmb{X})+c_2\mathbb{d}\pmb{G}(\pmb{X}) \\\\ \tag{22_2} 其中,c_1,c_2 为常数。证明:c_1\pmb{F}(\pmb{X})+c_2\pmb{G}(\pmb{X}) 的每个元素都是 c_1f_{ij}(\pmb{X})+c_2g_{ij}(\pmb{X}) ,由 (16\_2) 式可知,每个元素的全微分是 c_1\mathbb{d}f_{ij}(\pmb{X})+c_2\mathbb{d}g_{ij}(\pmb{X}) 。证毕。c. 乘积法则前微后不微 + 前不微后微\mathbb{d}(\pmb{F}(\pmb{X})\pmb{G}(\pmb{X}))=\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X}) + \pmb{F}(\pmb{X})\mathbb{d}\pmb{G}(\pmb{X}) \\\\ \tag{22_3_1}其中, \pmb{F}_{p \times q}(\pmb{X}),\pmb{G}_{q \times s}(\pmb{X}) 。注意:此时的微分是矩阵,不能交换乘积的左右顺序。证明:\pmb{F}(\pmb{X})\pmb{G}(\pmb{X}) 的每个元素都是 \sum_{k=1}^q[f_{ik}(\pmb{X})g_{kj}(\pmb{X})] ,由 (16\_2) 式、 (16\_3) 式可知,每个元素的全微分是\begin{align} \mathbb{d}\left( \sum_{k=1}^q[f_{ik}(\pmb{X})g_{kj}(\pmb{X})] \right) &=\sum_{k=1}^q \mathbb{d}(f_{ik}(\pmb{X})g_{kj}(\pmb{X})) \\\\ &= \sum_{k=1}^q[\mathbb{d}(f_{ik}(\pmb{X}))g_{kj}(\pmb{X})+f_{ik}(\pmb{X})\mathbb{d}g_{kj}(\pmb{X})] \\\\ &= \sum_{k=1}^q[\mathbb{d}(f_{ik}(\pmb{X}))g_{kj}(\pmb{X})]+ \sum_{k=1}^q[f_{ik}(\pmb{X})\mathbb{d}g_{kj}(\pmb{X})]
\end{align}
\\\ \tag{22_3_1_a} 结果左边的求和式,就是 \mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X})
的每个元素,结果右边的求和式,就是 \pmb{F}(\pmb{X})\mathbb{d}\pmb{G}(\pmb{X}) 的每个元素。证毕。由此,很容易得到更多个乘积的法则:\mathbb{d}(\pmb{F}(\pmb{X})\pmb{G}(\pmb{X})\pmb{H}(\pmb{X}))=\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X})\pmb{H}(\pmb{X}) + \pmb{F}(\pmb{X})\mathbb{d}(\pmb{G}(\pmb{X}))\pmb{H}(\pmb{X})+ \pmb{F}(\pmb{X})\pmb{G}(\pmb{X})\mathbb{d}\pmb{H}(\pmb{X}) \\\\ \tag{22_3_2} 证明:\begin{align} \mathbb{d}(\pmb{F}(\pmb{X})\pmb{G}(\pmb{X})\pmb{H}(\pmb{X}))
&= \mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X})\pmb{H}(\pmb{X})+\pmb{F}(\pmb{X})\mathbb{d}(\pmb{G}(\pmb{X})\pmb{H}(\pmb{X}))
\\\\
&=
\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X})\pmb{H}(\pmb{X}) +\pmb{F}(\pmb{X})[\mathbb{d}(\pmb{G}(\pmb{X}))\pmb{H}(\pmb{X}) + \pmb{G}(\pmb{X})\mathbb{d}\pmb{H}(\pmb{X})]
\\\\
&=\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{G}(\pmb{X})\pmb{H}(\pmb{X}) + \pmb{F}(\pmb{X})\mathbb{d}(\pmb{G}(\pmb{X}))\pmb{H}(\pmb{X})+ \pmb{F}(\pmb{X})\pmb{G}(\pmb{X})\mathbb{d}\pmb{H}(\pmb{X})
\end{align} \\\\ \tag{22_3_2_a} 证毕。d. 转置法则转置的矩阵微分等于矩阵微分的转置 \mathbb{d}\pmb{F}^T_{p \times q}(\pmb{X})= (\mathbb{d}\pmb{F}_{p \times q}(\pmb{X}))^T
\\\ \tag{22_4_1} 证明:\begin{align} \mathbb{d}\pmb{F}^T_{p \times q}(\pmb{X})
&= \mathbb{d} \begin{bmatrix}
f_{11}(\pmb{X})& f_{21}(\pmb{X}) & \cdots & f_{p1}(\pmb{X}) \\ f_{12}(\pmb{X})& f_{22}(\pmb{X}) & \cdots & f_{p2}(\pmb{X}) \\
\vdots&\vdots&\vdots&\vdots \\
f_{1q}(\pmb{X})&f_{2q}(\pmb{X}) & \cdots & f_{pq}(\pmb{X})
\end{bmatrix}_{q \times p}
\\\\ &= \begin{bmatrix}
\mathbb{d}f_{11}(\pmb{X})& \mathbb{d}f_{21}(\pmb{X}) & \cdots & \mathbb{d}f_{p1}(\pmb{X}) \\ \mathbb{d}f_{12}(\pmb{X})& \mathbb{d}f_{22}(\pmb{X}) & \cdots & \mathbb{d}f_{p2}(\pmb{X}) \\
\vdots&\vdots&\vdots&\vdots \\
\mathbb{d}f_{1q}(\pmb{X})&\mathbb{d}f_{2q}(\pmb{X}) & \cdots & \mathbb{d}f_{pq}(\pmb{X})
\end{bmatrix}_{q \times p}
\\\\
&= \begin{bmatrix} \mathbb{d}f_{11}(\pmb{X})& \mathbb{d}f_{12}(\pmb{X}) & \cdots & \mathbb{d}f_{1q}(\pmb{X}) \\ \mathbb{d}f_{21}(\pmb{X})& \mathbb{d}f_{22}(\pmb{X}) & \cdots & \mathbb{d}f_{2q}(\pmb{X}) \\ \vdots&\vdots&\vdots&\vdots \\ \mathbb{d}f_{p1}(\pmb{X})& \mathbb{d}f_{p2}(\pmb{X}) & \cdots & \mathbb{d}f_{pq}(\pmb{X})
\end{bmatrix}_{p \times q}^T \\\\ &=
(\mathbb{d}\pmb{F}_{p \times q}(\pmb{X}))^T \end{align}
\\\ \tag{22_4_2} 证毕。3.2 为什么要使用矩阵微分求导\pmb{X}_{m \times n} 自己就是矩阵变元为 \pmb{X}_{m \times n} 的实矩阵函数,它的每个元素是 x_{ij} ,每个元素的全微分是 \mathbb d{x_{ij}} 。因此, \pmb{X}_{m \times n} 的矩阵微分是:\begin{align} \mathbb{d}\pmb{X}_{m \times n} &=
\begin{bmatrix} \mathbb{d}x_{11}& \mathbb{d}x_{12} & \cdots & \mathbb{d}x_{1n} \\
\mathbb{d}x_{21}& \mathbb{d}x_{22} & \cdots & \mathbb{d}x_{2n} \\
\vdots&\vdots&\vdots&\vdots \\
\mathbb{d}x_{m1}& \mathbb{d}x_{m2} & \cdots & \mathbb{d}x_{mn} \\
\end{bmatrix}_{m \times n}
\end{align}
\\\ \tag{23_1} 向量 \pmb{x}=[x_1,x_2,\cdots,x_n]^T 的矩阵微分是:\begin{align} \mathbb{d}\pmb{x} &=
\begin{bmatrix}
\mathbb{d}x_{1}\\
\mathbb{d}x_{2}\\
\vdots \\
\mathbb{d}x_{n} \\
\end{bmatrix}_{n \times 1}
\end{align}
\\\ \tag{23_2} 于是,我们刚刚讲到的矩阵微分四个法则,对于 \mathbb{d}\pmb{X}_{m \times n},\mathbb{d}\pmb{x} 也是适用的。我们现在回到矩阵变元的实值标量函数的全微分,即 (20) 式:\begin{align} \mathbb{d}f(\pmb{X}) &=\frac{\partial f}{\partial x_{11}}\mathbb{d}x_{11}+\frac{\partial f}{\partial x_{12}}\mathbb{d}x_{12} + \cdots+\frac{\partial f}{\partial x_{1n}}\mathbb{d}x_{1n}\\ &+\frac{\partial f}{\partial x_{21}}\mathbb{d}x_{21}+\frac{\partial f}{\partial x_{22}}\mathbb{d}x_{22} + \cdots+\frac{\partial f}{\partial x_{2n}}\mathbb{d}x_{2n}\\ &+\cdots\\ &+\frac{\partial f}{\partial x_{m1}}\mathbb{d}x_{m1}+\frac{\partial f}{\partial x_{m2}}\mathbb{d}x_{m2} + \cdots+\frac{\partial f}{\partial x_{mn}}\mathbb{d}x_{mn}
\\\\
&=\mathbb{tr}( \begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&\cdots&\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&\cdots&\frac{\partial f}{\partial
x_{mn}}
\end{bmatrix}_{n\times m} \begin{bmatrix} \mathbb{d}x_{11}
& \mathbb{d}x_{12} & \cdots
& \mathbb{d}x_{1n} \\ \mathbb{d}x_{21}
& \mathbb{d}x_{22} & \cdots
& \mathbb{d}x_{2n} \\ \vdots&\vdots&\vdots&\vdots\\ \mathbb{d}x_{m1}
& \mathbb{d}x_{m2} & \cdots
& \mathbb{d}x_{mn}
\end{bmatrix}_{m \times n} ) \end{align}
\\\ \tag{20} 观察 (20) 式的结果,发现在 \mathbb{tr} 中,左边的矩阵,其实就是 (本质篇\_9) 式:\begin{align*} \text{D}_{\pmb{X}}f(\pmb{X})&= \frac{\partial f(\pmb{X})}{\partial \pmb{X}^T_{m\times n}} \\\\ &= \left[
\matrix{ \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&\cdots&\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}& \cdots & \frac{\partial f}{\partial x_{m2}}\\ \vdots&\vdots&\vdots&\vdots\\ \frac{\partial f} {\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&\cdots&\frac{\partial f}{\partial
x_{mn}} } \right]_{n\times m} \end{align*} \\\\
\tag{本质篇_9} 而右边的矩阵,其实就是 (23\_1) 式:\begin{align} \mathbb{d}\pmb{X}_{m \times n} &=
\begin{bmatrix} \mathbb{d}x_{11}& \mathbb{d}x_{12} & \cdots & \mathbb{d}x_{1n} \\
\mathbb{d}x_{21}& \mathbb{d}x_{22} & \cdots & \mathbb{d}x_{2n} \\
\vdots&\vdots&\vdots&\vdots \\
\mathbb{d}x_{m1}& \mathbb{d}x_{m2} & \cdots & \mathbb{d}x_{mn} \\
\end{bmatrix}_{m \times n}
\end{align}
\\\ \tag{23_1} 因此,矩阵变元的实值标量函数的全微分,即 (20) 式,可以写成:\begin{align} \mathbb{d}f(\pmb{X}) &=\mathbb{tr}(\frac{\partial f(\pmb{X})}{\partial\pmb{X}^T} \mathbb{d}\pmb{X})\end{align}
\\\ \tag{24} 别忘了我们的目标是什么,其实就是要求 \frac{\partial f(\pmb{X})}{\partial \pmb{X}^T} 。所以,只要我们可以把一个矩阵变元的实值标量函数的全微分写成 (24) 式,我们就找到了矩阵求导的结果。(已经有人证明[9],这样的结果是唯一的。即若 \mathbb{d}f(\pmb{X}) =\mathbb{tr}(\pmb{A}_1\mathbb{d}\pmb{X}) = \mathbb{tr}(\pmb{A}_2\mathbb{d}\pmb{X}) ,则 \pmb{A}_1=\pmb{A}_2 )对于向量变元的实值标量函数的全微分,即 (18) 式,同样可以写成:\begin{align} \mathbb{d}f(\pmb{x}) &=\mathbb{tr}(\frac{\partial f(\pmb{x})}{\partial\pmb{x}^T} \mathbb{d}\pmb{x})\end{align}
\\\ \tag{25} 而由本质篇_三._2.5_2.5.2 指出的,当矩阵变元 \pmb{X} 本身就是一个列向量\pmb{x} 时\frac{\partial f(\pmb{X})}{\partial \pmb{X}^T} = \frac{\partial f(\pmb{x})}{\partial \pmb{x}^T} \\\\ \tag{26} 同时,由 (23\_1) 式、 (23\_2) 式,当矩阵 \pmb{X} 本身是列向量 \pmb{x} 时,也有\mathbb{d}\pmb{X} = \mathbb{d}\pmb{x}
\\\\ \tag{27}所以,矩阵变元或向量变元的实值标量函数的矩阵求导的结果,都可以通过 (24) 式得到:\begin{align} \mathbb{d}f(\pmb{X}) &=\mathbb{tr}(\frac{\partial f(\pmb{X})}{\partial\pmb{X}^T} \mathbb{d}\pmb{X})\end{align}
\\\ \tag{24} 那么,我们该如何写成形如 (24) 式的结果呢,别急,让我们先给出 3\times 2=6 个你应该记住的公式(以后就直接用了)。3.2.1[8]夹层饼\mathbb{d}(\pmb{A}\pmb{X}\pmb{B})=\pmb{A}\mathbb{d}(\pmb{X})\pmb{B} \\\\ \tag{25_1_1} 其中, \pmb{A}_{p \times m},\pmb{B}_{n \times q} 是常数矩阵。证明:由乘积法则 (22\_3\_2) 式得:\begin{align} \mathbb{d}(\pmb{A}\pmb{X}\pmb{B}) & =
\mathbb{d}(\pmb{A})\pmb{X}\pmb{B} + \pmb{A}\mathbb{d}({\pmb{X}})\pmb{B} + \pmb{A}\pmb{X}\mathbb{d}\pmb{B}
\end{align} \\\\ \tag{25_1_a} 由常数矩阵微分 (22\_1) 式得:\mathbb{d}\pmb{A} =\pmb{0}_{p \times m},\mathbb{d}\pmb{B} =\pmb{0}_{n \times q} \\\\ \tag{25_1_b}证毕。\pmb{X}_{m \times n} 可以代入其他任意的矩阵函数:\mathbb{d}(\pmb{A}\pmb{F}(\pmb{X})\pmb{B})=\pmb{A}\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{B} \\\\ \tag{25_1_2} 3.2.2[10] 行列式 \mathbb{d}|\pmb{X}|=
\pmb{X}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) = \mathbb{tr}(|\pmb{X}|\pmb{X}^{-1}\mathbb{d}\pmb{X})
\\\\ \tag{25_2_1} 其中, \pmb{X}_{n \times n} 。证明:首先明确,行列式是一个实值标量函数,故可以使用 (24) 式。我们知道,行列式可以按照一行展开,即一行中每个元素乘以他的代数余子式然后求和[11]。我们按照元素 x_{ij} 所在的第 i 行展开:|\pmb{X}|=x_{i1}A_{i1}+x_{i2}A_{i2}+\cdots+x_{in}A_{in} \\\\ \tag{25_2_a}因此,行列式对元素 x_{ij} 的偏导,即为该元素对应的代数余子式。\frac{\partial
\pmb{X}|}{\partial x_{ij}} = A_{ij} \\\\ \tag{25_2_b}因此,行列式对矩阵求导的结果为:\begin{align} \frac{\partial
\pmb{X}|}{\partial \pmb{X}^T} &=
\begin{bmatrix} A_{11} & A_{21} & \cdots & A_{n1} \\ A_{12} & A_{22} & \cdots & A_{n2} \\ \vdots & \vdots & \ddots & \vdots \\ A_{1n} & A_{2n} & \cdots & A_{nn} \\ \end{bmatrix}
\end{align} \\\\ \tag{25_2_c} 这个结果其实就是伴随矩阵[12] \pmb{X}^* 。又因为伴随矩阵和逆矩阵的关系[13]:\pmb{X}^{-1}=\frac{\pmb{X}^*}{|\pmb{X}|} \\\\ \tag{25_2_d} 代入 (24) 式得:\begin{align} \mathbb{d}|\pmb{X}
&=\mathbb{tr}(\frac{\partial
\pmb{X}|}{\partial\pmb{X}^T} \mathbb{d}\pmb{X}) \\\\ &=\mathbb{tr}(|\pmb{X}|\pmb{X}^{-1}\mathbb{d}\pmb{X}) \end{align} \\\\ 又因为行列式是标量,由 (3) 式,可以提到迹的外面,得: \mathbb{d}|\pmb{X}|=
\pmb{X}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) = \mathbb{tr}(|\pmb{X}|\pmb{X}^{-1}\mathbb{d}\pmb{X})
\\\\ \tag{25_2_1} 证毕。\pmb{X}_{n \times n} 可以代入其他任意的矩阵函数[10]: \mathbb{d}|\pmb{F}(\pmb{X})|=
\pmb{F}(\pmb{X})|\mathbb{tr}(\pmb{F}(\pmb{X})^{-1}\mathbb{d}\pmb{F}(\pmb{X})) = \mathbb{tr}(|\pmb{F}(\pmb{X})|\pmb{F}(\pmb{X})^{-1}\mathbb{d}\pmb{F}(\pmb{X})) \\\\
\\\\ \tag{25_2_2} 3.2.3[10] 逆矩阵\mathbb{d}(\pmb{X}^{-1})=-\pmb{X}^{-1}\mathbb{d}(\pmb{X})\pmb{X}^{-1}
\\\\
\\\\ \tag{25_3_1}其中, \pmb{X}_{n \times n} 。证明:因为 \pmb{X}\pmb{X}^{-1}=\pmb{E} \\\\ 而常数矩阵微分为 \pmb{0} ,两边同时取矩阵微分得:\begin{equation} \mathbb{d}(\pmb{X})\pmb{X}^{-1}+\pmb{X}\mathbb{d}(\pmb{X}^{-1}) =\pmb{0} \\\\ \end{equation}
\\\\ \tag{25_3_1} 等式两边左乘 \pmb{X}^{-1} 即得到结果。证毕。\pmb{X}_{n \times n} 可以代入其他任意的矩阵函数[10]:\mathbb{d}(\pmb{F}(\pmb{X})^{-1})=-\pmb{F}(\pmb{X})^{-1}\mathbb{d}(\pmb{F}(\pmb{X}))\pmb{F}(\pmb{X})^{-1}
\\\\
\\\\ \tag{25_3_2} 3.3 如何使用矩阵微分求导对于实值标量函数 f(\pmb{X}) , \mathbb{tr}(f(\pmb{X})) =f(\pmb{X}) , \mathbb{d}f(\pmb{X})=\mathbb{tr}(\mathbb{d}f(\pmb{X})) 所以有\mathbb{d}f(\pmb{X}) = \mathbb{d}(\mathbb{tr}f(\pmb{X}))=\mathbb{tr}(\mathbb{d}f(\pmb{X})) \\\\ \tag{26} 如果实值标量函数本身就是某个矩阵函数 \pmb{F}_{p \times p}(\pmb{X}) 的迹,如 \mathbb{tr}{\pmb{F}(\pmb{X})} ,则由全微分的线性法则 (16\_2) 式,得:\mathbb{d}(\mathbb{tr}{\pmb{F}_{p\times p}(\pmb{X})}) = \mathbb{d}(\sum_{i=1}^pf_{ii}(\pmb{X})) = \sum_{i=1}^p\mathbb{d}(f_{ii}(\pmb{X})) = \mathbb{tr}(\mathbb{d}F_{p \times p}(\pmb{X}))
\\\\ \tag{27} 我们以 6 个例子来非常非常详细地说明如何使用矩阵微分求导,例子的结论不需要记忆,会推过程才是最重要的,用的时候推一下就好了。3.3.1 例子1 (基础篇\_31) 式\frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} = \pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X}
\\\\ \tag{28} 证明:第一步:写成 (26) 式的形式\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})= \mathbb{tr}(\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}))\\\\ \tag{29} 第二步:使用矩阵微分法则 (22\_1) 式~ (22\_4\_1) 式,迹的若干性质 (2) 式~ (10) 式,六个基础公式 (25\_1\_1) 式~ (25\_3\_2) 式,将 (29) 式化简成形如 (24) 式的形式由 (25\_1\_2) 式得:\begin{align} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b}) &= \mathbb{tr}(\mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})) \\\\ &= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X}\pmb{X}^T)\pmb{b})
\end{align}
\\\\ \tag{30} 由 (22\_3\_1) 式得:\begin{align} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})
&= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X}\pmb{X}^T)\pmb{b}) \\\\ &=
\mathbb{tr}[\pmb{a}^T(\mathbb{d}(\pmb{X})\pmb{X}^T+\pmb{X}\mathbb{d}\pmb{X}^T)\pmb{b}] \end{align}
\\\\ \tag{31} 由 (3) 式得:\begin{align} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})
&=
\mathbb{tr}[\pmb{a}^T(\mathbb{d}(\pmb{X})\pmb{X}^T+\pmb{X}\mathbb{d}\pmb{X}^T)\pmb{b}] \\\ &= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}\mathbb{d}(\pmb{X}^T)\pmb{b}) \end{align}
\\\\ \tag{32} 由 (22\_4\_1) 式得:\begin{align} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})
&= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}\mathbb{d}(\pmb{X}^T)\pmb{b}) \\\\ &= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}(\mathbb{d}\pmb{X})^T\pmb{b})
\end{align}
\\\\ \tag{33} 由 (9) 式, (10) 式得:\begin{align} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})
&= \mathbb{tr}(\pmb{a}^T\mathbb{d}(\pmb{X})\pmb{X}^T\pmb{b})+\mathbb{tr}(\pmb{a}^T\pmb{X}(\mathbb{d}\pmb{X})^T\pmb{b}) \\\\ &= \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}(\pmb{b}\pmb{a}^T\pmb{X}(\mathbb{d}\pmb{X})^T)\\\\
&= \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}((\pmb{b}\pmb{a}^T\pmb{X})^T\mathbb{d}\pmb{X})\\\\
&= \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}(\pmb{X}^T\pmb{a}\pmb{b}^T\mathbb{d}\pmb{X}) \end{align}
\\\\ \tag{34} 由 (3) 式得:\begin{align} \mathbb{d}(\pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})
&= \mathbb{tr}(\pmb{X}^T\pmb{b}\pmb{a}^T\mathbb{d}\pmb{X}) + \mathbb{tr}(\pmb{X}^T\pmb{a}\pmb{b}^T\mathbb{d}\pmb{X}) \\\\ &=
\mathbb{tr}((\pmb{X}^T\pmb{b}\pmb{a}^T+\pmb{X}^T\pmb{a}\pmb{b}^T)\mathbb{d}\pmb{X}) \end{align}
\\\\ \tag{35} 第三步:得出结果\begin{align}
\frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}^T}} &=\pmb{X}^T\pmb{b}\pmb{a}^T+\pmb{X}^T\pmb{a}\pmb{b}^T \\\\
\frac{\partial( \pmb{a}^T\pmb{X}\pmb{X}^T\pmb{b})}{\partial{\pmb{X}}} &= \pmb{a}\pmb{b}^T\pmb{X}+\pmb{b}\pmb{a}^T\pmb{X}
\\\\
\end{align}
\\\\ \tag{36,28} 证毕。3.3.2 例子2[9]\frac{\partial \mathbb{tr}(\pmb{X}^T\pmb{X})}{\partial \pmb{X}} = 2\pmb{X} \\\\ \tag{37} 第一步:写成 (27) 式的形式\mathbb{d(}\mathbb{tr}(\pmb{X}^T\pmb{X})) =\mathbb{tr}(\mathbb{d}(\pmb{X}^T\pmb{X})) \\\\ \tag{38} 第二步:使用矩阵微分法则 (22\_1) 式~ (22\_4\_1) 式,迹的若干性质 (2) 式~ (10) 式,六个基础公式 (25\_1\_1) 式~ (25\_3\_2) 式,将 (38) 式化简成形如 (24) 式的形式由 (22\_3\_1) 式得:\begin{align} \mathbb{d(}\mathbb{tr}(\pmb{X}^T\pmb{X})) &=\mathbb{tr}(\mathbb{d}(\pmb{X}^T\pmb{X})) \\\\ &= \mathbb{tr}(\mathbb{d}(\pmb{X}^T)\pmb{X}+\pmb{X}^T\mathbb{d}\pmb{X})) \end{align}
\\\\ \tag{39} 由 (3) 式得:\begin{align} \mathbb{d(}\mathbb{tr}(\pmb{X}^T\pmb{X}))
&= \mathbb{tr}(\mathbb{d}(\pmb{X}^T)\pmb{X}+\pmb{X}^T\mathbb{d}\pmb{X})) \\\\
&= \mathbb{tr}(\mathbb{d}(\pmb{X}^T)\pmb{X})+\mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X}))
\end{align}
\\\\ 由 (22\_4\_1) 式得:\begin{align} \mathbb{d(}\mathbb{tr}(\pmb{X}^T\pmb{X}))
&= \mathbb{tr}(\mathbb{d}(\pmb{X}^T)\pmb{X})+\mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X})) \\\\ &= \mathbb{tr}((\mathbb{d}\pmb{X})^T\pmb{X})+\mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X}))
\end{align}
\\\\ 由 (8) 式、 (10) 式得:\begin{align} \mathbb{d(}\mathbb{tr}(\pmb{X}^T\pmb{X}))
&= \mathbb{tr}((\mathbb{d}\pmb{X})^T\pmb{X})+\mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X})) \\\\ &=
\mathbb{tr}(\pmb{X}(\mathbb{d}\pmb{X})^T)+\mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X})) \\\\ &=
\mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X})+\mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X}))\\\\ &= 2 \mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X}) \end{align}
\\\\
由 (3) 式得:\begin{align} \mathbb{d(}\mathbb{tr}(\pmb{X}^T\pmb{X}))
&= 2 \mathbb{tr}(\pmb{X}^T\mathbb{d}\pmb{X}) \\\\ &=
\mathbb{tr}(2\pmb{X}^T\mathbb{d}\pmb{X})
\end{align}
\\\\ 第三步:得出结果\begin{align} \frac{\partial \mathbb{tr}(\pmb{X}^T\pmb{X})}{\partial \pmb{X}^T}
&= 2\pmb{X}^T\\\\
\frac{\partial \mathbb{tr}(\pmb{X}^T\pmb{X})}{\partial \pmb{X}} &= 2\pmb{X}
\end{align}
\\\\ \tag{40} 3.3.3 例子3[14]\frac{\partial \log|\pmb{X}|}{\partial \pmb{X}}
= (\pmb{X}^{-1})^T \\\\ \tag{41} 其中, \pmb{X}_{n \times n} 。第一步:写成 (26) 式的形式\begin{align} \mathbb{d}(\log|\pmb{X}|) = \mathbb{tr}(\mathbb{d}(\log|\pmb{X}|)) \end{align}
\\\\ \tag{42} 第二步:使用矩阵微分法则 (22\_1) 式~ (22\_4\_1) 式,迹的若干性质 (2) 式~ (10) 式,六个基础公式 (25\_1\_1) 式~ (25\_3\_2) 式,将 (42) 式化简成形如 (24) 式的形式我们发现,这是一个复合函数的全微分,
\pmb{X}
是多元函数, \log u 是一元函数,故由 (15) 式中的前两个等号,令 z=\log|\pmb{X}|,u=|\pmb{X}
,则\begin{align} \mathbb{d}(\log|\pmb{X}|)
&= \mathbb{tr}(\mathbb{d}(\log|\pmb{X}|)) \\\\ &= \mathbb{tr}(\mathbb{d}z)
\\\\ &= \mathbb{tr}(\mathbb{d}(\log u)) \\\\ &= \mathbb{tr}(\frac{1}{u}(\mathbb{d}u) )\\\\ &= \mathbb{tr}(\frac{1}{|\pmb{X}|}\mathbb{d}|\pmb{X}|)\end{align}
\\\\ \tag{43}由 (25\_2\_1) 式得:\begin{align} \mathbb{d}(\log|\pmb{X}|) &= \mathbb{tr}(\frac{1}{|\pmb{X}|}\mathbb{d}|\pmb{X}|) \\\\
&= \mathbb{tr}(\frac{1}{|\pmb{X}|}{|\pmb{X}|}\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}))
\end{align}
\\\\ \tag{44}标量的迹还是标量,由 (2) 式得:\begin{align} \mathbb{d}(\log|\pmb{X}|)
&= \mathbb{tr}(\frac{1}{|\pmb{X}|}{|\pmb{X}|}\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X})) \\\\
&= \frac{1}{|\pmb{X}|}{|\pmb{X}|}\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) \\\\ &= \mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) \end{align}
\\\\ \tag{45} 第三步:得出结果\begin{align}
\frac{\partial \log|\pmb{X}|}{\partial \pmb{X}^T}
&= \pmb{X}^{-1}\\\\ \frac{\partial \log|\pmb{X}|}{\partial \pmb{X}}
&= (\pmb{X}^{-1})^T \end{align} \\\\ \tag{46} 3.3.4 例子4[14]\frac{\partial
\pmb{X}^{-1}|}{\partial \pmb{X}}
=-|\pmb{X}^{-1}|(\pmb{X}^{-1})^T
\\\\ \tag{47} 其中, \pmb{X}_{n \times n} 。第一步:写成 (26) 式的形式,由 (25\_2\_2) 式得:\begin{align}
\mathbb{d}
\pmb{X}^{-1}
&=
\pmb{X}^{-1}|\mathbb{tr}((\pmb{X}^{-1})^{-1}\mathbb{d}(\pmb{X}^{-1})) \\\\ &=
\pmb{X}^{-1}|\mathbb{tr}(\pmb{X}\mathbb{d}(\pmb{X}^{-1}))
\end{align}
\ \\\ \tag{48} 第二步:使用矩阵微分法则 (22\_1) 式~ (22\_4\_1) 式,迹的若干性质 (2) 式~ (10) 式,六个基础公式 (25\_1\_1) 式~ (25\_3\_2) 式,将 (48) 式化简成形如 (24) 式的形式由 (25\_3\_1) 式得:\begin{align}
\mathbb{d}
\pmb{X}^{-1}
&=
\pmb{X}^{-1}|\mathbb{tr}(\pmb{X}\mathbb{d}(\pmb{X}^{-1})) \\\\ &=
\pmb{X}^{-1}|\mathbb{tr}(-\pmb{X}\pmb{X}^{-1}\mathbb{d}(\pmb{X})\pmb{X}^{-1}) \\\\ &=
\pmb{X}^{-1}|\mathbb{tr}(-\mathbb{d}(\pmb{X})\pmb{X}^{-1}) \end{align}
\ \\\ \tag{49} 由 (3) 式得:\begin{align}
\mathbb{d}
\pmb{X}^{-1}
&=
\pmb{X}^{-1}|\mathbb{tr}(-\mathbb{d}(\pmb{X})\pmb{X}^{-1}) \\\\ &= -|\pmb{X}^{-1}|\mathbb{tr}(\mathbb{d}(\pmb{X})\pmb{X}^{-1}) \end{align}
\ \\\ \tag{50} 由 (8) 式得:\begin{align}
\mathbb{d}
\pmb{X}^{-1}
&= -|\pmb{X}^{-1}|\mathbb{tr}(\mathbb{d}(\pmb{X})\pmb{X}^{-1}) \\\\ &= -|\pmb{X}^{-1}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X})
\end{align}
\\\\ 由 (3) 式得:\begin{align}
\mathbb{d}
\pmb{X}^{-1}
&= -|\pmb{X}^{-1}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) \\\\ &= \mathbb{tr}(-|\pmb{X}^{-1}|\pmb{X}^{-1}\mathbb{d}(\pmb{X}))
\end{align}
\\\\ \tag{51} 第三步:得出结果\begin{align}
\frac{\partial
\pmb{X}^{-1}|}{\partial \pmb{X}^T}
&=-|\pmb{X}^{-1}|\pmb{X}^{-1} \\\\ \frac{\partial
\pmb{X}^{-1}|}{\partial \pmb{X}}
&=-|\pmb{X}^{-1}|(\pmb{X}^{-1})^T
\end{align}
\\\ \tag{52} 3.3.5 例子5[15]\begin{align}
\frac{\partial \mathbb{tr}(\pmb{X}+\pmb{A})^{-1}}{\partial \pmb{X}}
&=-((\pmb{X}+\pmb{A})^{-2})^T
\end{align}
\\\ \tag{53}其中, \pmb{A}_{n \times n} 为常数矩阵, \pmb{X}_{n \times n},(\pmb{X}+\pmb{A})^{-2}=(\pmb{X}+\pmb{A})^{-1}(\pmb{X}+\pmb{A})^{-1} 。 第一步:写成 (27) 式的形式\begin{align}
\mathbb{d} (\mathbb{tr}(\pmb{X}+\pmb{A})^{-1}) &= \mathbb{tr}(\mathbb{d}(\pmb{X}+\pmb{A})^{-1})
\end{align}
\\\ \tag{54}第二步:使用矩阵微分法则 (22\_1) 式~ (22\_4\_1) 式,迹的若干性质 (2) 式~ (10) 式,六个基础公式 (25\_1\_1) 式~ (25\_3\_2) 式,将 (54) 式化简成形如 (24) 式的形式由 (25\_3\_2) 式得:\begin{align}
\mathbb{d} (\mathbb{tr}(\pmb{X}+\pmb{A})^{-1}) &= \mathbb{tr}(\mathbb{d}(\pmb{X}+\pmb{A})^{-1})\\\\ &= \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-1}(\mathbb{d}(\pmb{X}+\pmb{A}))(\pmb{X}+\pmb{A})^{-1})
\end{align}
\\\ \tag{55} 由 (9) 式得:\begin{align}
\mathbb{d} (\mathbb{tr}(\pmb{X}+\pmb{A})^{-1})
&= \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-1}(\mathbb{d}(\pmb{X}+\pmb{A}))(\pmb{X}+\pmb{A})^{-1}) \\\\ &= \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-1}(\pmb{X}+\pmb{A})^{-1}\mathbb{d}(\pmb{X}+\pmb{A})) \\\\ &= \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-2}\mathbb{d}(\pmb{X}+\pmb{A}))
\end{align}
\\\ \tag{56} 由 (22\_2) 式得:\begin{align}
\mathbb{d} (\mathbb{tr}(\pmb{X}+\pmb{A})^{-1})
&= \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-2}\mathbb{d}(\pmb{X}+\pmb{A})) \\\\ &= \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-2}(\mathbb{d}\pmb{X}+\mathbb{d}\pmb{A})) \end{align}
\\\ \tag{57} 由 (22\_1) 式得:\begin{align}
\mathbb{d} (\mathbb{tr}(\pmb{X}+\pmb{A})^{-1})
&= \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-2}(\mathbb{d}\pmb{X}+\mathbb{d}\pmb{A})) \\\\ &= \mathbb{tr}(-(\pmb{X}+\pmb{A})^{-2}\mathbb{d}\pmb{X})
\end{align}
\\\ \tag{58} 第三步:得出结果\begin{align}
\frac{\partial \mathbb{tr}(\pmb{X}+\pmb{A})^{-1}}{\partial \pmb{X}^T}
&=-(\pmb{X}+\pmb{A})^{-2} \\\\
\frac{\partial \mathbb{tr}(\pmb{X}+\pmb{A})^{-1}}{\partial \pmb{X}}
&=-((\pmb{X}+\pmb{A})^{-2})^T
\end{align}
\\\ \tag{59} 3.3.6 例子6[15]\begin{align}
\frac{\partial|\pmb{X}^3|}{\partial \pmb{X}} &=\frac{\partial|\pmb{X}|^3}{\partial \pmb{X}}
=3|\pmb{X}|^3(\pmb{X}^{-1})^T = 3|\pmb{X}^3|(\pmb{X}^{-1})^T \end{align}
\\\ \tag{60} 第一步:写成 (26) 式的形式我们知道,对于 n 阶矩阵 \pmb{A},\pmb{B} ,有
\pmb{A}\pmb{B}|=|\pmb{A}
\pmb{B}
因此,有|\pmb{X}^3|=
\pmb{X}\pmb{X}\pmb{X}
=
\pmb{X}
\pmb{X}
\pmb{X}
=
\pmb{X}|^3 \\\\
\tag{61} 所以\begin{align} \mathbb{d}|\pmb{X}^3
=\mathbb{d}(|\pmb{X}|^3)= \mathbb{tr}(\mathbb{d}(|\pmb{X}|^3)) \end{align}
\\\ \tag{62} 第二步:使用矩阵微分法则 (22\_1) 式~ (22\_4\_1) 式,迹的若干性质 (2) 式~ (10) 式,六个基础公式 (25\_1\_1) 式~ (25\_3\_2) 式,将 (62) 式化简成形如 (24) 式的形式我们发现,这是一个复合函数的全微分,
\pmb{X}
是多元函数, u^3 是一元函数,故由 (15) 式中的前两个等号,令 z=|\pmb{X}|^3,u=|\pmb{X}
,则\begin{align}
\mathbb{d}(\mathbb{tr}(|\pmb{X}|^3)) &= \mathbb{tr}(\mathbb{d}(|\pmb{X}|^3))
\\\\ &= \mathbb{tr}(\mathbb{d}z) \\\\ &= \mathbb{tr}(\mathbb{d}(u^3)) \\\\ &= \mathbb{tr}(3u^2\mathbb{d}u) \\\\ &= \mathbb{tr}(3|\pmb{X}|^2\mathbb{d}|\pmb{X}|) \end{align}
\\\ \tag{63} 由 (25\_2\_1) 式得:\begin{align}
\mathbb{d}(\mathbb{tr}(|\pmb{X}|^3))
&= \mathbb{tr}(3|\pmb{X}|^2\mathbb{d}|\pmb{X}|) \\\\ &= \mathbb{tr}(3|\pmb{X}|^2|\pmb{X}|\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) ) \\\\ &= \mathbb{tr}(3|\pmb{X}|^3\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) ) \end{align}
\\\ \tag{64 } 标量的迹还是标量,由 (2) 式得:\begin{align}
\mathbb{d}(\mathbb{tr}(|\pmb{X}|^3))
&= \mathbb{tr}(3|\pmb{X}|^3\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X}) ) \\\\ &= 3|\pmb{X}|^3\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X})\end{align}
\\\ \tag{65 }由 (3) 式得:\begin{align}
\mathbb{d}(\mathbb{tr}(|\pmb{X}|^3)) &= 3|\pmb{X}|^3\mathbb{tr}(\pmb{X}^{-1}\mathbb{d}\pmb{X})\\\\ &= \mathbb{tr}(3|\pmb{X}|^3\pmb{X}^{-1}\mathbb{d}\pmb{X}) \\\\ &= \mathbb{tr}(3|\pmb{X}^3|\pmb{X}^{-1}\mathbb{d}\pmb{X}) \end{align}
\\\
第三步:得出结果\begin{align}
\frac{\partial|\pmb{X}^3|}{\partial \pmb{X}^T} &=\frac{\partial|\pmb{X}|^3}{\partial \pmb{X}^T}
=3|\pmb{X}|^3\pmb{X}^{-1} = 3|\pmb{X}^3|\pmb{X}^{-1} \\\\
\frac{\partial|\pmb{X}^3|}{\partial \pmb{X}} &=\frac{\partial|\pmb{X}|^3}{\partial \pmb{X}}
=3|\pmb{X}|^3(\pmb{X}^{-1})^T = 3|\pmb{X}^3|(\pmb{X}^{-1})^T \end{align}
\\\ \tag{66}四. 完本系列到这里就结束了,至此,我们遇到的所有的矩阵变元/向量变元的实值标量函数的一阶矩阵求导都可以用本文的方法进行计算。至于高阶求导、矩阵变元的实矩阵函数的求导,我目前还没有遇到,如果将来遇到了,我会考虑再写几篇的。欢迎大家点赞、关注、收藏、转发噢~矩阵求导系列其他文章:对称矩阵的求导,以多元正态分布的极大似然估计为例(矩阵求导——补充篇) - Iterator的文章 - 知乎矩阵求导公式的数学推导(矩阵求导——基础篇) - Iterator的文章 - 知乎矩阵求导的本质与分子布局、分母布局的本质(矩阵求导——本质篇) - Iterator的文章 - 知乎参考^张贤达《矩阵分析与应用(第二版)》P50^《高等数学 同济大学第七版 上册》P111^《高等数学 同济大学第七版 上册》P115^《高等数学 同济大学第七版 下册》P72^《高等数学 同济大学第七版 下册》P114^张贤达《矩阵分析与应用(第二版)》P154^张贤达《矩阵分析与应用(第二版)》P155^ab张贤达《矩阵分析与应用(第二版)》P152^ab张贤达《矩阵分析与应用(第二版)》P156^abcd张贤达《矩阵分析与应用(第二版)》P153^《工程数学线性代数 同济大学第六版》P17^《工程数学线性代数 同济大学第六版》P38^《工程数学线性代数 同济大学第六版》P40^ab张贤达《矩阵分析与应用(第二版)》P160^ab张贤达《矩阵分析与应用(第二版)》P158

我要回帖

更多关于 常用的矩阵求导公式推导过程 的文章

 

随机推荐