反向传播算法推导

2019-01-19 16:17:45 +08:00
 MoModel

反向传播(英语:Backpropagation,缩写为 BP )是“误差反向传播”的简称,是一种与最优化方法(如梯度下降法)结合使用的,用来训练人工神经网络的常见方法。该方法对网络中所有权重计算损失函数的梯度。这个梯度会反馈给最优化方法,用来更新权值以最小化损失函数。

假设,你有这样一个网络层

第一层是输入层,包含两个神经元 $i1$$i2$,和截距项$b1$;第二层是隐含层,包含两个神经元$h1$,$h2$和截距项$b2$,第三层是输出$o1$,$o2$,每条线上标的$wi$是层与层之间连接的权重,激活函数我们默认为 sigmoid 函数。

现在对他们赋上初值,如下图:

      其中, 输入数据 $i1=0.05$$i2=0.10$;

输出数据 $o1=0.01$$o2=0.99$;

初始权重
$w1=0.15$$w2=0.20$, $w3=0.25$$w4=0.30$; $w5=0.40$$w6=0.45$, $w7=0.50$$w8=0.55$;

目标:给出输入数据$i1$$i2$(0.05 和 0.10),使输出尽可能与原始输出$o1$,$o2$(0.01 和 0.99)接近。

前向传播过程

1. 输入层---->隐含层:

计算神经元$h1$的输入加权和:

net_{h1} = w_1 * i_1 + w_2 * i_2 + b_1 * 1

net_{h1} = 0.15 * 0.05 + 0.2 * 0.1 + 0.35 * 1 = 0.3775

计算神经元$h1$的输出$o1$:(此处用到激活函数为 sigmoid 函数)

out_{h1} = \frac{1}{1+e^{-net_{h1}}} = 0.5932

同理,可计算神经元 $h2$ 的输出 $o2$

out_{h2} = 0.5968

2. 隐藏层---->输出层:

net_{o1} = w_5 * out_{h1} + w_6 * out_{h2} + b_2 * 1

out_{o1} =  \frac{1}{1+e^{-net_{o1}}} = 0.7514

同样的,计算神经元 o2 的输出

out_{o2} = 0.7730

反向传播过程

接下来,就可以进行反向传播的计算了

1. 计算总误差

E_{total} = E_{o1} + E_{o2}

分别计算$o1$,$o2$的误差

E_{o1} = \frac{1}{2} (target_{o1} - out_{o1})^2 = 0.2748

E_{o2} = \frac{1}{2} (target_{o2} - out_{o2})^2 = 0.0235
E_{total} = E_{o1} + E_{o2} = 0.2983

2. 隐含层---->输出层的权值更新:

以权重参数$w5$为例,如果我们想知道$w5$对整体误差产生了多少影响,可以用整体误差对$w5$求偏导求出(链式法则)

\frac {\partial (E_{total} )}{\partial (w_{5})} = \frac {\partial (E_{total} )}{\partial (out_{o1})} + \frac {\partial (out_{o1} )}{\partial (net_{o1})} + \frac {\partial (net_{o1} )}{\partial (w_{5})}

下面的图可以更直观的看清楚误差是怎样反向传播的

我们分别计算每个式子的值:

计算 $\frac {\partial (E_{total} )}{\partial (out_{o1})}$

 E_{total} = \frac {1}{2}(target_{o1} - out_{o1} )^2 +\frac {1}{2}(target_{o2} - out_{o2} )^2
\frac {\partial (E_{total} )}{\partial (out_{o1})} = - (target_{o1} - out{o1} ) = 0.7414

计算 $ \frac {\partial ( out_{o1} )}{\partial (net_{o1})} $

 out_{o1} = \frac{1}{1+e^{-net_{o1}}}
 
 \frac {\partial ( out_{o1}  )}{\partial (net_{o1})} =  out_{o1}(1 - out_{o1} ) = 0.1868
 
 

计算 $ \frac {\partial ( net_{o1} )}{\partial (w_{5})}$

net_{o1} = w_5 * out_{h1} + w_6 * out_{h2} + b_2 * 1

 \frac {\partial ( net_{o1}  )}{\partial (w_{5})} =  out_{h1} = 0.5932

最后三者相乘


\frac {\partial (E_{total} )}{\partial (w_{5})} = \frac {\partial (E_{total} )}{\partial (out_{o1})} * \frac {\partial (out_{o1} )}{\partial (net_{o1})} * \frac {\partial (net_{o1} )}{\partial (w_{5})} = 0.082

看看上面的公式,我们发现:

\frac {\partial (E_{total} )}{\partial (w_{5})} = -(target_{o1}-out_{o1})*out_{o1}(1-out_{o1})*out_{h1}

为了表达方便,用$\delta _{o1}$来表示输出层的误差

\delta _{o1} = \frac {\partial (E_{total} )}{\partial (out_{o1})} + \frac {\partial (out_{o1} )}{\partial (net_{o1})}


\delta _{o1} = -(target_{o1}-out_{o1})*out_{o1}(1-out_{o1})


\frac {\partial (E_{total} )}{\partial (w_{5})} = \delta _{o1} *out_{h1}

更新$w_5$的值:


w_5^+ = w_5 - \eta * \frac {\partial (E_{total} )}{\partial (w_{5})} = 0.3589

同理,更新 $w_6$,$w_7$,$w_8$

w_6^+ = 0.4086
w_7^+ = 0.5113
w_8^+ = 0.5614

3.隐含层---->隐含层的权值更新:

我们可以依照上述的方法计算 $w_1$, $w_2$, $w_3$, $w_4$,方法其实与上面说的差不多,但是有个地方需要变一下。

在上文计算总误差对 w5 的偏导时,是从:

$out_{o1}$ -> $net_{o1}$ -> $w_5$

但是在隐含层之间的权值更新时,是从:

$out_{h1}$ -> $net_{h1}$ -> $w_1$

计算 $\frac {\partial (E_{total} )}{\partial (out_{h1})}$

\frac {\partial (E_{total} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (out_{h1})} + \frac {\partial (E_{o2} )}{\partial (out_{h1})}

先计算$\frac {\partial (E_{o1} )}{\partial (out_{h1})}$

\frac {\partial (E_{o1} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (net_{o1})} * \frac {\partial (net_{o1} )}{\partial (out_{h1})}
 \frac {\partial (E_{o1} )}{\partial (net_{o1})} = \frac {\partial (E_{o1} )}{\partial (out_{o1})} * \frac {\partial (out_{o1} )}{\partial (net_{o1})} = 0.1385
 net_{o1} = w_5 * out_{h1} + w_6 * out_{h2} + b_2 * 1

 \frac {\partial (net_{o1} )}{\partial (out_{h1})} = w_5= 0.40
\frac {\partial (E_{o1} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (net_{o1})} * \frac {\partial (net_{o1} )}{\partial (out_{h1})} = 0.138 * 0.4 = 0.055

同理,计算出

 \frac {\partial (E_{o2} )}{\partial (out_{h1})} = -0.019

两者相加,得到总值

 \frac {\partial (E_{total} )}{\partial (out_{h1})} = \frac {\partial (E_{o1} )}{\partial (out_{h1})} + \frac {\partial (E_{o2} )}{\partial (out_{h1})} = 0.036

再计算 $\frac {\partial (out_{h1} )}{\partial (net_{h1})}$


 out_{h1} = \frac{1}{1+e^{-net_{h1}}}
 
 
 \frac {\partial (out_{h1} )}{\partial (net_{h1})} =  out_{h1} *(1-out_{h1}) = 0.2413

再计算$ \frac {\partial (net_{h1} )}{\partial (w_{1})} $

 net_{h1} =  w_1 * i_1 + w_2 * i_2 + b_1 * 1
 
  \frac {\partial (net_{h1} )}{\partial (w_{1})} = i_1 =0.05
  

最后,三者相乘

 \frac {\partial (E_{total} )}{\partial (w_{1})} = \frac {\partial (E_{total} )}{\partial (out_{h1})} * \frac {\partial (out_{h1} )}{\partial (net_{h1})} * \frac {\partial (net_{h1} )}{\partial (w_{1})}
 
  \frac {\partial (E_{total} )}{\partial (w_{1})} =  0.036 * 0.2413 * 0.05 = 0.000438
 

我们更新$w_1$的值


w_1^+ = w_1 - \eta * \frac {\partial (E_{total} )}{\partial (w_{1})} = 0.1498

同理,更新 $w_2$,$w_3$,$w_4$

w_2^+ = 0.1996
w_3^+ = 0.2498
w_4^+ = 0.2995

这样误差反向传播法就完成了,最后我们再把更新的权值重新计算,不停地迭代.

完整代码( PC 端查看): http://www.momodel.cn:8899/#/explore/5b84e0098fe30b727acaa360?type=app

—————————————————————————————————————————————————————————————————————— Mo (网址:momodel.cn )是一个支持 Python 的人工智能在线建模平台,能帮助你快速开发训练并部署 AI 应用。期待你的加入。

2516 次点击
所在节点    机器学习
3 条回复
nical
2019-01-21 19:23:54 +08:00
厉害了,很有帮助
MoModel
2019-01-21 20:20:33 +08:00
@nical 不好意思很多公式都乱码了,请直接用 PC 端打开 http://www.momodel.cn:8899/#/explore/5b84e0098fe30b727acaa360?type=app 查看源码
MoModel
2019-04-28 09:46:21 +08:00

这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/528603

V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.

© 2021 V2EX