注意,这个 C++ 扩展被命名为 test_cpp,意思是说,在 python 中可以通过 test_cpp 模块来调用 C++ 函数。
第三步
在 cpu 这个目录下,执行下面的命令编译安装 C++ 代码:
python setup.py install
之后,可以看到一堆输出,该 C++ 模块会被安装在 python 的 site-packages 中。
完成上面几步后,就可以在 python 中调用 C++ 代码了。在 PyTorch 中,按照惯例需要先把 C++ 中的前向传播和反向传播封装成一个函数 op(以下代码放在 test.py 文件中):
from torch.autograd import Function import test_cpp class TestFunction(Function): @staticmethod def forward(ctx, x, y): return test_cpp.forward(x, y) @staticmethod def backward(ctx, gradOutput): gradX, gradY = test_cpp.backward(gradOutput) return gradX, gradY
这样一来,我们相当于把 C++ 扩展的函数嵌入到 PyTorch 自己的框架内。
我查看了这个 Function 类的代码,发现是个挺有意思的东西:
class Function(with_metaclass(FunctionMeta, _C._FunctionBase, _ContextMethodMixin, _HookMixin)):
... @staticmethod def forward(ctx, *args, **kwargs): r"""Performs the operation. This function is to be overridden by all subclasses. It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types). The context can be used to store tensors that can be then retrieved during the backward pass. """ raise NotImplementedError @staticmethod def backward(ctx, *grad_outputs): r"""Defines a formula for differentiating the operation. This function is to be overridden by all subclasses. It must accept a context :attr:`ctx` as the first argument, followed by as many outputs did :func:`forward` return, and it should return as many tensors, as there were inputs to :func:`forward`. Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. The context can be used to retrieve tensors saved during the forward pass. It also has an attribute :attr:`ctx.needs_input_grad` as a tuple of booleans representing whether each input needs gradient. E.g., :func:`backward` will have ``ctx.needs_input_grad[0] = True`` if the first input to :func:`forward` needs gradient computated w.r.t. the output. """ raise NotImplementedError
这里需要注意一下 backward 的实现规则。该接口包含两个参数:ctx 是一个辅助的环境变量,grad_outputs 则是来自前一层网络的梯度列表,而且这个梯度列表的数量与 forward 函数返回的参数数量相同,这也符合链式法则的原理,因为链式法则就需要把前一层中所有相关的梯度与当前层进行相乘或相加。同时,backward 需要返回 forward 中每个输入参数的梯度,如果 forward 中包括 n 个参数,就需要一一返回 n 个梯度。所以,在上面这个例子中,我们的 backward 函数接收一个参数作为输入(forward 只输出一个变量),并返回两个梯度(forward 接收上一层两个输入变量)。