文章目录

TorchScript教程

TorchScript教程

TorchScript介绍

TorchScript是Pytorch模型（继承自nn.Module）的中间表示，可以在像C++这种高性能的环境中运行。

这个教程主要涵盖以下内容：

Pytorch基础
- Modules
- 定义forward函数
- 将modules组合进modules
定义转换pytorch的modules到TorchScript的方法，一个高性能的部署运行：
- 追踪一个已存在的module
- 使用scripting来直接编译一个module
- 如何结合两者方法
- 保存和加载TorchScript的modules

Pytorch模型编写基础

一个Module定义基础包含以下内容：

一个构造器，准备一些初始化参数
一个Parameters和sub-Modules的集合，这些将会在构造器内初始化，如torch.nn.Linear之类的
一个forward函数，当module被调用时调用

如下一个示例模型：

class MyDecisionGate(torch.nn.Module):
    def forward(self, x):
        if x.sum() > 0:
            return x
        else:
            return -x

class MyCell(torch.nn.Module):
    def __init__(self):
        super(MyCell, self).__init__()
        self.dg = MyDecisionGate()
        self.linear = torch.nn.Linear(4, 4)

    def forward(self, x, h):
        new_h = torch.tanh(self.dg(self.linear(x)) + h)
        return new_h, new_h

my_cell = MyCell()
print(my_cell)
print(my_cell(x, h))

#输出如下
MyCell(
  (dg): MyDecisionGate()
  (linear): Linear(in_features=4, out_features=4, bias=True)
)
(tensor([[-0.0657,  0.1869,  0.8526,  0.3125],
        [ 0.6072,  0.7615,  0.6674,  0.7230],
        [ 0.0875,  0.7908,  0.6205,  0.5743]], grad_fn=<TanhBackward>), tensor([[-0.0657,  0.1869,  0.8526,  0.3125],
        [ 0.6072,  0.7615,  0.6674,  0.7230],
        [ 0.0875,  0.7908,  0.6205,  0.5743]], grad_fn=<TanhBackward>))

其中torch.nn.Linear是一个Module都继承了Module，打印Module可以给出其层级。可以看到其子类和参数。

其中，输出中有grad_fn，允许我们潜在的计算导数。

这里我们加入了MyDecisionGate，这个modules使用了控制流control flow，包含了循环和if判断。

许多框架在给定完整程序的情况下计算符号的导数，但是Pytorch中在计算的时候记录，在计算中向后回放。这样就不用为所有的构造定义导数函数。

TorchScript基础

TorchScript提供了工具来捕捉模型的定义，即使在轻量灵活和动态的Pytorch中。我们通过tracing 的方法。

如下：

class MyCell(torch.nn.Module):
    def __init__(self):
        super(MyCell, self).__init__()
        self.linear = torch.nn.Linear(4, 4)

    def forward(self, x, h):
        new_h = torch.tanh(self.linear(x) + h)
        return new_h, new_h

my_cell = MyCell()
x, h = torch.rand(3, 4), torch.rand(3, 4)
traced_cell = torch.jit.trace(my_cell, (x, h))
print(traced_cell)
traced_cell(x, h)

# 输出
MyCell(
  original_name=MyCell
  (linear): Linear(original_name=Linear)
)

其中，我们调用了torch.jit.trace，传入Module和符合的示例输入。

它会调用Moduel并将操作记录下来，当Module运行时记录下操作，然后创建torch.jit.ScriptModule的实例（其中TracedModule是一个实例）

TorchScript记录下模型定义在中间表示中（Intermediate Representation (IR)），在深度学习中通常被称为graph，我们可以打印.graph属性。

print(traced_cell.graph)
# 输出
graph(%self.1 : __torch__.MyCell,
      %input : Float(3, 4, strides=[4, 1], requires_grad=0, device=cpu),
      %h : Float(3, 4, strides=[4, 1], requires_grad=0, device=cpu)):
  %21 : __torch__.torch.nn.modules.linear.Linear = prim::GetAttr[name="linear"](%self.1)
  %23 : Tensor = prim::CallMethod[name="forward"](%21, %input)
  %14 : int = prim::Constant[value=1]() # /var/lib/jenkins/workspace/beginner_source/Intro_to_TorchScript_tutorial.py:188:0
  %15 : Float(3, 4, strides=[4, 1], requires_grad=1, device=cpu) = aten::add(%23, %h, %14) # /var/lib/jenkins/workspace/beginner_source/Intro_to_TorchScript_tutorial.py:188:0
  %16 : Float(3, 4, strides=[4, 1], requires_grad=1, device=cpu) = aten::tanh(%15) # /var/lib/jenkins/workspace/beginner_source/Intro_to_TorchScript_tutorial.py:188:0
  %17 : (Float(3, 4, strides=[4, 1], requires_grad=1, device=cpu), Float(3, 4, strides=[4, 1], requires_grad=1, device=cpu)) = prim::TupleConstruct(%16, %16)
  return (%17)

但是，这个是非常low-level的表示并且大部分信息对使用者来说没啥用，我们可以使用.code属性来给出python类型的表示：

print(traced_cell.code)
# 输出
def forward(self,
    input: Tensor,
    h: Tensor) -> Tuple[Tensor, Tensor]:
  _0 = torch.add((self.linear).forward(input, ), h, alpha=1)
  _1 = torch.tanh(_0)
  return (_1, _1)

所以为何要这样做，原因如下：

TorchScript能够使用它自己的解释器调用，比如基本事一个受限的Python解释器，它不需要全局解释器锁，所以很多请求可以在同一个实例上同步执行。
这个格式吮吸我们保存整个模型到磁盘上，然后到另外一个环境上加载，比如不是用python写的服务器。
TorchScript给我们一个表示形式，我们可以做编译优化来提供更高的性能。
TorchScript允许我们与很多backend/device 运行时进行交互。

我们可以看到调用traced_cell产生于Python module相同的结果：

print(my_cell(x, h))
print(traced_cell(x, h))

使用Scripting来转换Modules

对于有控制流的模型，直接使用torch.jit.trace()并不能跟踪到控制流，因为它只是对操作进行了记录，对于没有运行到的操作并不会记录，如下：

class MyDecisionGate(torch.nn.Module):
    def forward(self, x):
        if x.sum() > 0:
            return x
        else:
            return -x

class MyCell(torch.nn.Module):
    def __init__(self, dg):
        super(MyCell, self).__init__()
        self.dg = dg
        self.linear = torch.nn.Linear(4, 4)

    def forward(self, x, h):
        new_h = torch.tanh(self.dg(self.linear(x)) + h)
        return new_h, new_h

my_cell = MyCell(MyDecisionGate())
traced_cell = torch.jit.trace(my_cell, (x, h))

print(traced_cell.dg.code)
print(traced_cell.code)

# 输出
def forward(self,
    argument_1: Tensor) -> None:
  return None

def forward(self,
    input: Tensor,
    h: Tensor) -> Tuple[Tensor, Tensor]:
  _0 = self.dg
  _1 = (self.linear).forward(input, )
  _2 = (_0).forward(_1, )
  _3 = torch.tanh(torch.add(_1, h, alpha=1))
  return (_3, _3)

可以看到.code的输出，if-else的分支没有了，控制流会被擦除。

如何解决，可以使用

script compiler

来解决，可以直接分析你的Python源代码来把它转化为TrochScript。如下：

scripted_gate = torch.jit.script(MyDecisionGate())

my_cell = MyCell(scripted_gate)
scripted_cell = torch.jit.script(my_cell)

print(scripted_gate.code)
print(scripted_cell.code)

# 输出
def forward(self,
    x: Tensor) -> Tensor:
  _0 = bool(torch.gt(torch.sum(x, dtype=None), 0))
  if _0:
    _1 = x
  else:
    _1 = torch.neg(x)
  return _1

def forward(self,
    x: Tensor,
    h: Tensor) -> Tuple[Tensor, Tensor]:
  _0 = (self.dg).forward((self.linear).forward(x, ), )
  new_h = torch.tanh(torch.add(_0, h, alpha=1))
  return (new_h, new_h)

可以看到控制流保存下来了，这下可以进行普通推理了。

混合Scripting和Tracing

有些情况下会使用tracing而不是scripting（例如一个module有很多基于Python值得架构选择决定我们并不像它出现在TorchScript中），这样，Scripting可以由tracing组成：torch.jit.script将会内敛一个traced module的代码，tracing会内联一个scripted module。

如下：

class MyRNNLoop(torch.nn.Module):
    def __init__(self):
        super(MyRNNLoop, self).__init__()
        self.cell = torch.jit.trace(MyCell(scripted_gate), (x, h))

    def forward(self, xs):
        h, y = torch.zeros(3, 4), torch.zeros(3, 4)
        for i in range(xs.size(0)):
            y, h = self.cell(xs[i], h)
        return y, h

rnn_loop = torch.jit.script(MyRNNLoop())
print(rnn_loop.code)

# 输出如下
def forward(self,
    xs: Tensor) -> Tuple[Tensor, Tensor]:
  h = torch.zeros([3, 4], dtype=None, layout=None, device=None, pin_memory=None)
  y = torch.zeros([3, 4], dtype=None, layout=None, device=None, pin_memory=None)
  y0 = y
  h0 = h
  for i in range(torch.size(xs, 0)):
    _0 = (self.cell).forward(torch.select(xs, 0, i), h0, )
    y1, h1, = _0
    y0, h0 = y1, h1
  return (y0, h0)

下面是第二个例子：

class WrapRNN(torch.nn.Module):
    def __init__(self):
        super(WrapRNN, self).__init__()
        self.loop = torch.jit.script(MyRNNLoop())

    def forward(self, xs):
        y, h = self.loop(xs)
        return torch.relu(y)

traced = torch.jit.trace(WrapRNN(), (torch.rand(10, 3, 4)))
print(traced.code)

# 输出如下
def forward(self,
    argument_1: Tensor) -> Tensor:
  _0, h, = (self.loop).forward(argument_1, )
  return torch.relu(h)

这样，scripting和tracing可以在两者都被使用的时候调用。

保存和加载模型

我们提供了API来从磁盘上以一个archive的格式来加载和保存模型。这个格式包括code，参数，属性和调试细腻些，意味着这个归档文件是模型的独立表示，可以在一个独立的过冲中进行加载，如下：

traced.save('wrapped_rnn.pt')

loaded = torch.jit.load('wrapped_rnn.pt')

print(loaded)
print(loaded.code)

# 输出
RecursiveScriptModule(
  original_name=WrapRNN
  (loop): RecursiveScriptModule(
    original_name=MyRNNLoop
    (cell): RecursiveScriptModule(
      original_name=MyCell
      (dg): RecursiveScriptModule(original_name=MyDecisionGate)
      (linear): RecursiveScriptModule(original_name=Linear)
    )
  )
)
def forward(self,
    argument_1: Tensor) -> Tensor:
  _0, h, = (self.loop).forward(argument_1, )
  return torch.relu(h)

由上可知，序列化保存模型的层级和我们已经检验通过的代码，这个模型可以被加载，例如，在C++中加载。

在C++中加载一个TorchScript模型

为了适应低延迟和严苛的部署环境，相对于Python这类要求动态灵活性的语言，C++比较适合。下面将介绍如何使用现有的python模型转为能够被C++执行的序列化表示，不依赖与Python

步骤1：转换Ptyorch模型到TrochScript

可以使用Troch Script将Pytorch模型转为C++的，它是Pytorch模型的一种表示，可以被理解，编译和序列化通过Torch Script编译器。

有两种方法来转换模型到Torch Script：

第一种为使用tracing机制，记录下推理的过程，这个适合没有啥控制流的模型。
第二种为加入显示的注释到模型中来通知Torch Script编译器它可能直接解析和编译你的模型代码，受Torch Script语言约束。

通过tracing来转为Torch Script

必须传入一个示例输入到torch.jit.trace中，这个将会产生一个torch.jit.ScriptModule类型对象，其中嵌入了模型forward方法中的方法记录：

import torch
import torchvision

# An instance of your model.
model = torchvision.models.resnet18()

# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)

# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)

此时，这个traced ScriptModule可以被认为是一个普通的Pytorch module：

In[1]: output = traced_script_module(torch.ones(1, 3, 224, 224))
In[2]: output[0, :5]
Out[2]: tensor([-0.2698, -0.0381,  0.4023, -0.3010, -0.0448], grad_fn=<SliceBackward>)

通过Annotation来转换为Torch Script

在某些情况下，如果你的模型有特殊的控制流，你可能想要直接使用Torch Script来写你的模型，例如，你有下面的普通的pytorch 模型：

import torch

class MyModule(torch.nn.Module):
    def __init__(self, N, M):
        super(MyModule, self).__init__()
        self.weight = torch.nn.Parameter(torch.rand(N, M))

    def forward(self, input):
        if input.sum() > 0:
          output = self.weight.mv(input)
        else:
          output = self.weight + input
        return output

由于这个forward方法使用了控制流，依赖于输入，因此不适合使用tracing。替代方案是我们可以将它转换为一个ScriptModule。为了将module转换为ScriptModule，我们需要像下面的使用torch.jit.script来编译module。

class MyModule(torch.nn.Module):
    def __init__(self, N, M):
        super(MyModule, self).__init__()
        self.weight = torch.nn.Parameter(torch.rand(N, M))

    def forward(self, input):
        if input.sum() > 0:
          output = self.weight.mv(input)
        else:
          output = self.weight + input
        return output

my_module = MyModule(10,20)
sm = torch.jit.script(my_module)

如果你想要排除一些nn.Module中的方法因为它们使用了Python特性，TrochScript不支持，你可以使用注解@torch.jit.ignore忽略它们。

由上，my_module是ScriptModule的实例，你可以序列化它了。

步骤2：将你的Script Module序列化到一个文件

一旦你有一个ScriptModule，不管是tracing还是注解来的，你可以序列化为一个文件。然后就可以用C++加载和运行了。不需要任何Python以来。想上面的resnet18模型，直接调用Save在module上来序列化它。

traced_script_module.save("traced_resnet_model.pt")

这个将会生成一个traced_resnet_model.pt 文件在你的工作目录，如果你也想要序列化my_module，直接调用my_module.save(“my_module_model.pt”)，我们现在可以使用C++调用了。

步骤3：使用C++来加载你的Script Module

为了加载你的序列化Pytorch模型，你的应用必须依赖于Pytorch 的C++ API，LibTorch，这个库打包了一些共享库集合，头文件和CMake构建配置文件，接下来将会运行一个最小的C++应用简单加载一个序列化的Pytorch模型。

一个最简单的C++应用

开始加载module如下：

#include <torch/script.h> // One-stop header.

#include <iostream>
#include <memory>

int main(int argc, const char* argv[]) {
  if (argc != 2) {
    std::cerr << "usage: example-app <path-to-exported-script-module>\n";
    return -1;
  }


  torch::jit::script::Module module;
  try {
    // Deserialize the ScriptModule from a file using torch::jit::load().
    module = torch::jit::load(argv[1]);
  }
  catch (const c10::Error& e) {
    std::cerr << "error loading the model\n";
    return -1;
  }

  std::cout << "ok\n";
}

这个<torch/script.h>头文件整合了所有相关的LibTroch库相关文件来运行例子，我们的应用接受文件路径到序列化的Pytorch ScriptModule。加载后返回一个torch::jit::script::Module对象。

依赖LibTorch和构建应用

将cpp存储为example-app.cpp，然后写一个最小的CmakeLists.txt来构建：

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(custom_ops)

find_package(Torch REQUIRED)

add_executable(example-app example-app.cpp)
target_link_libraries(example-app "${TORCH_LIBRARIES}")
set_property(TARGET example-app PROPERTY CXX_STANDARD 14)

最后我们需要下载LibTorch哭，可以去

下载页

，下载和解压后得到的目录结构如下：

libtorch/
  bin/
  include/
  lib/
  share/

lib/ 文件夹中包含了必须链接的共享库
include/ 文件夹包含了你的程序需要包含的头文件
share/ 文件夹包含了必须的CMake配置来使得find_package（torch)生效。

接下来就按照正常的编译流程编译：

example-app/
  CMakeLists.txt
  example-app.cpp

mkdir build
cd build
cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
cmake --build . --config Release

root@4b5a67132e81:/example-app/build# ./example-app <path_to_model>/traced_resnet_model.pt
ok

步骤4：在c++中执行Script Module

加载完模型以后，我们就可以在主函数中调用了：

// Create a vector of inputs.
std::vector<torch::jit::IValue> inputs;
inputs.push_back(torch::ones({1, 3, 224, 224}));

// Execute the model and turn its output into a tensor.
at::Tensor output = module.forward(inputs).toTensor();
std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '\n';

这里创建了一个torch::jit::IValue向量（script::Module方法可以接受和返回），得到的返回值为新的IValue，然后可以转换为Tensor通过toTensor()方法。

注意

为了使用GPU，可以放入GPU上通过model.to(at::kCUDA);，同样，数据的话也是，tensor.to(at::kCUDA)，返回的新tensor也是在CUDA内存上。

步骤5：获取帮助探索API

下面一些链接可以帮助：

Torch Script 帮助：https://pytorch.org/docs/master/jit.html
Pytorch C++ API 文档：https://pytorch.org/cppdocs/
Pytorch Python API 文档：https://pytorch.org/docs/

将Pytorch模型导出ONNX且运行

ONNX Runtion是一个针对ONNX模型的高性能引擎，在多个平台和硬件上面推理十分高效。

这里，需要安装ONNX和ONNX Runtiom，可以使用pip安装，注意，兼容的Python版本为

3.5~3.7

使用一个普通的模型定义后，需要调用torch_model.eval()或者 torch_model.train(False) 在导出模型之前，来使得dropout和batchnorm等行为关闭。

# Load pretrained model weights
model_url = 'https://s3.amazonaws.com/pytorch/test_data/export/superres_epoch100-44c6958e.pth'
batch_size = 1    # just a random number

# Initialize model with the pretrained weights
map_location = lambda storage, loc: storage
if torch.cuda.is_available():
    map_location = None
torch_model.load_state_dict(model_zoo.load_url(model_url, map_location=map_location))

# set the model to inference mode
torch_model.eval()

在pytorch中导出模型使用的是tracing或者scripting，这个教程我们会使用tracing，我们会调用torch.onnx.export()函数。这个会执行模型，记录操作的trace。这个输入的大小是固定的在导出的ONNX的graph中，用于所有输入的尺寸，除非指定了一个动态的axes。这里设置为batch_size为1，但是在torch.onnx.export()函数中的参数指定dynamic_axes为动态的。

这个导出的模型可以输入的形状为： [batch_size, 1, 224, 224]，这个batchsize是一个变量。

# Input to the model
x = torch.randn(batch_size, 1, 224, 224, requires_grad=True)
torch_out = torch_model(x)

# Export the model
torch.onnx.export(torch_model,               # model being run
                  x,                         # model input (or a tuple for multiple inputs)
                  "super_resolution.onnx",   # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=10,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input'],   # the model's input names
                  output_names = ['output'], # the model's output names
                  dynamic_axes={'input' : {0 : 'batch_size'},    # variable length axes
                                'output' : {0 : 'batch_size'}})

可以验证模型输出是否一致。首先验证模型是否正确：

import onnx

onnx_model = onnx.load("super_resolution.onnx")
onnx.checker.check_model(onnx_model)

接下来按照如下进行验证：

import onnxruntime

ort_session = onnxruntime.InferenceSession("super_resolution.onnx")

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

# compute ONNX Runtime output prediction
ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)}
ort_outs = ort_session.run(None, ort_inputs)

# compare ONNX Runtime and PyTorch results
np.testing.assert_allclose(to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05)

print("Exported model has been tested with ONNXRuntime, and the result looks good!")

原文链接：https://blog.csdn.net/u012457196/article/details/115748710