PyTorch 中,所有神经网络的核心是 autograd 包;
我个人理解:(3步骤)
requires_grad = True
记录开始;demo1:
y = 3
x
2
; z = 2 y + 1
y=3x^2;z=2y+1
y=3x2;z=2y+1
求:
x = 2 :
d
z
d
x
= 2 ∗ 3 ∗ 2 ∗ x
x=2:\frac{dz}{dx}=2*3*2*x
x=2:dxdz=2∗3∗2∗x
#demo1:requires_grad = True记录开始
import torch
x = torch.tensor([2.], requires_grad=True)
x
tensor([2.], requires_grad=True)
y = x*x*3
print(y)
tensor([12.], grad_fn=
z =2* y+1
z
tensor([25.], grad_fn=
#demo1:𝑑z
z.backward()
#demo1:dz/𝑑x
x.grad
tensor([24.])
3. 一些扩展
#在创建tensor的时候指定requires_grad
import torch
a = torch.randn(3,4, requires_grad=True)
# 或者
a = torch.randn(3,4).requires_grad_()
# 或者
a = torch.randn(3,4)
a.requires_grad=True
a
tensor([[ 0.7728, -1.3390, -0.3797, -0.0128],
[ 1.6523, 0.6181, -1.7606, -1.0674],
[ 0.6788, 1.3278, 0.7995, 0.3913]], requires_grad=True)
.detach()
方法将其与计算历史分离,并阻止它未来的计算记录被跟踪input_B = output_A.detach()
返回一个新的tensor,新的tensor和原来的tensor共享数据内存,但不涉及梯度计算,即requires_grad=False。
修改其中一个tensor的值,另一个也会改变,因为是共享同一块内存,
x = torch.tensor([1.],requires_grad=True)
print(x.requires_grad)
print(x)
y = x.detach()
print("x,y地址:",x.data_ptr(),y.data_ptr())
print(y.requires_grad)
#修改y x也相应改变
y[0]=23
print(x)
print(y)
True
tensor([1.], requires_grad=True)
x,y地址: 3067429570112 3067429570112
False
tensor([23.], requires_grad=True)
tensor([23.])
z=3*x*x
print(z)
z.backward()
x.grad
tensor([1587.], grad_fn=
tensor([138.])
z=3*y*y
print(z)
z.backward()
y.grad
tensor([1587.])
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Input In [78], in
1 z=3*y*y
2 print(z)
----> 3 z.backward()
4 y.grad
File d:\ProgramData\Anaconda3\lib\site-packages\torch\_tensor.py:396, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
387 if has_torch_function_unary(self):
388 return handle_torch_function(
389 Tensor.backward,
390 (self,),
(...)
394 create_graph=create_graph,
395 inputs=inputs)
--> 396 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File d:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\__init__.py:173, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
168 retain_graph = create_graph
170 # The reason we repeat same the comment below is that
171 # some Python versions print out the first line of a multi-line function
172 # calls in the traceback and some print out the last line
--> 173 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
174 tensors, grad_tensors_, retain_graph, create_graph, inputs,
175 allow_unreachable=True, accumulate_grad=True)
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
报错,没有grad_fn状态,也就是没有开始追踪
torch.no_grad()
print(x.requires_grad)
print((x**2).requires_grad)
with torch.no_grad():
print((x**2).requires_grad)
True
True
False
tensor.data
进行操作。x = torch.ones(1,requires_grad=True)
print(x.data) # 还是一个tensor
print(x.data.requires_grad) # 但是已经是独立于计算图之外
y = 2 * x
x.data *= 100 # 只改变了值,不会记录在计算图,所以不会影响梯度传播
y.backward()
print(x) # 更改data的值也会影响tensor的值
print(x.grad)
tensor([1.])
False
tensor([100.], requires_grad=True)
tensor([2.])
y =
x
2
, z = 3
y
2
y=x^2,z=3y^2
y=x2,z=3y2
求:
d
z
d
x
\frac{dz}{dx}
dxdz
import torch
x = torch.ones(2, 2, requires_grad=True)
print(x)
tensor([[1., 1.],
[1., 1.]], requires_grad=True)
y = x**2
z = y * y * 3
print(z)
tensor([[3., 3.],
[3., 3.]], grad_fn=
z.backward()
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Input In [83], in
----> 1 z.backward()
File d:\ProgramData\Anaconda3\lib\site-packages\torch\_tensor.py:396, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
387 if has_torch_function_unary(self):
388 return handle_torch_function(
389 Tensor.backward,
390 (self,),
(...)
394 create_graph=create_graph,
395 inputs=inputs)
--> 396 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File d:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\__init__.py:166, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
162 inputs = (inputs,) if isinstance(inputs, torch.Tensor) else \
163 tuple(inputs) if inputs is not None else tuple()
165 grad_tensors_ = _tensor_or_tensors_to_tuple(grad_tensors, len(tensors))
--> 166 grad_tensors_ = _make_grads(tensors, grad_tensors_, is_grads_batched=False)
167 if retain_graph is None:
168 retain_graph = create_graph
File d:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\__init__.py:67, in _make_grads(outputs, grads, is_grads_batched)
65 if out.requires_grad:
66 if out.numel() != 1:
---> 67 raise RuntimeError("grad can be implicitly created only for scalar outputs")
68 new_grads.append(torch.ones_like(out, memory_format=torch.preserve_format))
69 else:
RuntimeError: grad can be implicitly created only for scalar outputs
报错原因:在 y.backward() 时,如果 y 是标量,则不需要为 backward() 传入任何参数;否则,需要传入一个与 y 同形的Tensor。
v = torch.tensor([[1.,0.1],[1.,1.]], dtype=torch.float)
x.grad.zero_()
z.backward(v)
print(x.grad)
tensor([[12.0000, 1.2000],
[12.0000, 12.0000]])
out = z.mean()
print(z, out)
tensor([[3., 3.],
[3., 3.]], grad_fn=
x.grad.zero_()
out.backward()
print(x.grad)
tensor([[3., 3.],
[3., 3.]])
前面之所以要加x.grad.zero_()
,因为:·grad
在反向传播过程中是累加的(accumulated),这意味着每一次运行反向传播,梯度都会累加之前的梯度,所以一般在反向传播之前需把梯度清零。
out2 = x.sum()
out2.backward()
print(x.grad)
out3 = x.sum()
x.grad.data.zero_()
out3.backward()
print(x.grad)
tensor([[4., 4.],
[4., 4.]])
tensor([[1., 1.],
[1., 1.]])
x.grad在out时候是 :
tensor([[3., 3.],
[3., 3.]])
out2.backward()本应该
tensor([[1., 1.],
[1., 1.]])
但是因为没有清零 所以
tensor([[4., 4.],
[4., 4.]])
https://datawhalechina.github.io/thorough-pytorch/%E7%AC%AC%E4%BA%8C%E7%AB%A0/2.2%20%E8%87%AA%E5%8A%A8%E6%B1%82%E5%AF%BC.html
https://blog.csdn.net/weixin_37804469/article/details/126082334