Pytorch中backward函数

backward

函数是反向求导数，使用链式法则求导，如果对非标量

y

求导，函数需要额外指定

grad_tensors

，

grad_tensors

的

shape

必须和

y

的相同。

import torch  
from torch.autograd import Variable  
x=Variable(torch.Tensor([16]),requires_grad=True) #需要求导数  
y=x*x  
y.backward()  
print(x.grad)

运行结果是：

tensor([32.])

，
y

是标量所以

backward

函数不需要额外的参数，backward没有参数的时候，默认对标量求导，有参数的时候，相当是指定了后一层传过来的梯度，根据chain rule，乘即可

import torch  
from torch.autograd import Variable  
x=Variable(torch.Tensor([1,16]),requires_grad=True) #需要求导数  
y=x*x  
y.backward()  
print(x.grad)

raise

RuntimeError(

“grad can be implicitly created only for scalar outputs”

)
RuntimeError: grad can be implicitly created only

for

scalar outputs
这里的

y=[1,256]

，不是标量就报错

RuntimeError: grad can be implicitly created only

for

scalar outputs
需要增加相应的参数

y.backward(tensor.Tensor[value,value])

，接着来看参数的含义具体是什么，这的参数就是上一层传过来的梯度，根据chain rule，需要相乘才可以

import torch  
from torch.autograd import Variable  
x=Variable(torch.Tensor([1,5,6,10,16]),requires_grad=True) #需要求导数  
y=x*x  
  
weights1=torch.ones(5)  
y.backward(weights1,retain_graph=True)  
print(x.grad)

运行结果是：

tensor([ 2., 10., 12., 20., 32.])
这里对

y

求导数，

y=2 * x

，是没有问题的，

weights1

是

5

维的和

y

的维度相同，由于使用了

weights1
这里的

y=x * 2 * weights1=2 * [1. * 1, 5. * 1, 6. * 1, 10. * 1, 16. *1]=[2 * 1.,10. * 1,12. * 1,20. * 1,32. * 1]
可以清楚的看到

weights1

是对求导的变量指定一个权重，这个权重就是后一层传过来的梯度，根据chain rule求导准则的，这一层的梯度需要乘上后一层传过来的梯度，以便更清楚的认识，再举个实例

import torch  
from torch.autograd import Variable  
x=Variable(torch.Tensor([1,5,6,10,16]),requires_grad=True) #需要求导数  
y=x*x  
  
weights1=torch.Tensor([1,0.1,0.1,0.5,0.1])  
y.backward(weights1,retain_graph=True)  
print(x.grad)

运行结果是：

tensor([ 2.0000, 1.0000, 1.2000, 10.0000, 3.2000])
这里对

y

求导数，

y=2*x

，但多了

weights1

，需要对

x的不同变量置偏重，这个偏重就是后一层传过来的梯度，根据chain rule求导准则的，这一层的梯度需要乘上后一层传过来的梯度
y=x*2*weights1=2 * [1. * 1, 5. * 0.1, 6. * 0.1, 10. * 0.5, 16. * 0.1]=[2.*1,10.*0.1,12.*0.1,20.*0.5,32.*0.1]
=[ 2.0000, 1.0000, 1.2000, 10.0000, 3.2000]
backward

函数中还有

retain_graph

参数

，使用

retain_graph

参数的时候，再次求导的时候，会对之前的导数进行累加

import torch  
from torch.autograd import Variable  
x=Variable(torch.Tensor([1,5,6,10,16]),requires_grad=True) #需要求导数  
y=x*x  
  
weights0=torch.ones(5)  
y.backward(weights0,retain_graph=True)  
print(x.grad)  
  
weights1=torch.FloatTensor([0.1,0.1,0.1,0.1,0.1])  
y.backward(weights1,retain_graph=True)  
print(x.grad)  

weights2=torch.FloatTensor([0.5,0.1,0.1,0.1,0.2])  
y.backward(weights2,retain_graph=True)  
print(x.grad)

运行结果是：
tensor([ 2., 10., 12., 20., 32.])
tensor([ 2.2000, 11.0000, 13.2000, 22.0000, 35.2000])
tensor([ 3.2000, 12.0000, 14.4000, 24.0000, 41.6000])
weights0

处求导，

y0=2*x*weights0=[ 2., 10., 12., 20., 32.]
由于使用

retain_graph

保留计算图和求导结果
weights1

处求导，

y1=y0+2*x*weights1=[ 2., 10., 12., 20., 32.]+[ 0.2, 1., 1.2, 2, 3.2]=[2.2

，

11.

，

13.2

，

22.

，

35.2]
仍然使用

retain_graph

保留计算图和求导结果
weights2

处求导，

y2=y1+2*x*weights2=[2.2

，

11.

，

13.2

，

22.

，

35.2]+[ 1., 1., 1.2, 2, 6.4]=[ 3.2, 12. 14.4, 24., 41.6]
可以看到使用

retain_graph

，接着的求导会进行累加，相当是训练了多个batch，然后梯度进行了累加，然后再一次update参数，而不是每个batch都进行update。
参考的内容：

https://pytorch.org/docs/stable/autograd.html?highlight=backward

#torch.autograd.backward

原文链接：https://blog.csdn.net/m0_50617544/article/details/120638499

你可能也喜欢