我们的目标是更新各叶子节点的梯度,根据复合函数导数的链式法则,不难算出各叶子节点的梯度。 ∂ z ∂ x = ∂ z ∂ y ∂ y ∂ x = w \frac{\partial \mathrm{z}}{\partial \mathrm{x}}=\frac{\partial \mathrm{z}}{\partial \mathrm{y}} \frac{\partial \mathrm{y}}{\partial \mathrm{x}}=\mathrm{w}∂x∂z=∂y∂z∂x∂y=w
∂ z ∂ w = ∂ z ∂ y ∂ y ∂ w = x \frac{\partial \mathrm{z}}{\partial \mathrm{w}}=\frac{\partial \mathrm{z}}{\partial \mathrm{y}} \frac{\partial \mathrm{y}}{\partial \mathrm{w}}=\mathrm{x}∂w∂z=∂y∂z∂w∂y=x
∂ z ∂ b = 1 \frac{\partial \mathrm{z}}{\partial \mathrm{b}}=1∂b∂z=1
定义损失函数,假设批量大小为 100: L o s s = 1 2 ∑ i = 1 100 ( w x i 2 + b − y i ) 2 Loss=\frac{1}{2}\sum^{100}_{i=1}{(wx_i^2+b-y_i)^2}Loss=21i=1∑100(wxi2+b−yi)2 对损失函数求导: ∂ L o s s ∂ w = ∑ i = 1 100 ( w x i 2 + b − y i ) x i 2 \frac{\partial{Loss}}{\partial{w}}=\sum^{100}_{i=1}{(wx_i^2+b-y_i)x_i^2}∂w∂Loss=i=1∑100(wxi2+b−yi)xi2
∂ L o s s ∂ b = ∑ i = 1 100 ( w x i 2 + b − y i ) \frac{\partial{Loss}}{\partial{b}}=\sum^{100}_{i=1}{(wx_i^2+b-y_i)}∂b∂Loss=i=1∑100(wxi2+b−yi)
利用梯度下降法学习参数,学习率为 l r lrlr: w 1 − = l r ∗ ∂ L o s s ∂ w w_1-=lr*\frac{\partial{Loss}}{\partial{w}}w1−=lr∗∂w∂Loss
b 1 − = l r ∗ ∂ L o s s ∂ b b_1-=lr*\frac{\partial{Loss}}{\partial{b}}b1−=lr∗∂b∂Loss
lr =0.001# 学习率
for i inrange(800): # 前向传播 y_pred = np.power(x,2)*w1 + b1 # 定义损失函数 loss =0.5*(y_pred - y)**2 loss = loss.sum() #计算梯度 grad_w=np.sum((y_pred - y)*np.power(x,2)) grad_b=np.sum((y_pred - y)) #使用梯度下降法,是loss最小 w1 -= lr * grad_w b1 -= lr * grad_b