TVM - GCN

本文将介绍如何用TVM Relay (v0.6)定义图卷积神经网络(GCN),参考自官方教程Building a Graph Convolutional Network,但增添了更多的性能测试比较。

其实可以将Relay理解为一个深度学习框架,毕竟TensorFlow、PyTorch、MXNet支持的算子它也都基本支持。

这里只考虑最简单的GCN,即

\[H^{(k+1)}=AH^{(k)}W^{(k)} =(((W^{(k)})^T(H^{(k)})^T)A^T)^T \implies (H^{(k+1)})^T=((W^{(k)})^T(H^{(k)})^T)A^T\]

之所以要写成后面的转置形式,是因为现在TVM Relay支持的算子库只有右乘稀疏矩阵。进而我们可以将GCN[ICLR’17]在DGL-PyTorch和TVM Relay中都搭建出来,然后比较他们的性能,见下面代码。



输出结果如下,可以看到TVM未加任何优化的裸性能其实已经和CPU版本的DGL性能差不多。(版本号:TVM v0.6、DGL v0.4.2)

Print the first five outputs from DGL-PyTorch execution
 tensor([[-2.1450, -1.3411,  3.6003,  0.1190, -0.9690, -1.4138, -1.0221],
        [ 0.6063, -1.1822, -0.3693, -0.9476,  0.5819,  1.2295,  0.2738],
        [-1.0876,  0.0565, -0.2626, -0.9312,  2.7453, -0.0950, -0.6077],
        [-1.4724,  0.0105, -0.1038, -0.2135,  2.2795, -0.5208, -0.5848],
        [-1.6528, -1.2279,  0.0541,  3.6322, -1.6748, -1.6925,  0.1763]])
Test accuracy of DGL results: 81.40%
DGL running time: 6.89 ms
Cannot find config for target=llvm, workload=('dense', (7, 16, 'float32'), (2708, 16, 'float32'), 0, 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=llvm, workload=('dense', (16, 1433, 'float32'), (2708, 1433, 'float32'), 0, 'float32'). A fallback configuration is used, which may bring great performance regression.
Print the first five outputs from TVM execution
 tensor([[-2.1450, -1.3411,  3.6003,  0.1190, -0.9690, -1.4138, -1.0221],
        [ 0.6063, -1.1822, -0.3693, -0.9476,  0.5819,  1.2295,  0.2738],
        [-1.0876,  0.0565, -0.2626, -0.9312,  2.7453, -0.0950, -0.6077],
        [-1.4724,  0.0105, -0.1038, -0.2135,  2.2795, -0.5208, -0.5848],
        [-1.6528, -1.2279,  0.0541,  3.6322, -1.6748, -1.6925,  0.1763]])
Test accuracy of TVM results: 81.40%
Mean inference time (std dev): 6.41 ms (0.11 ms)

遗憾的是,v0.6的TVM还没有from_pytorch导入模型的功能,因此所有的NN都要重新在Relay定义一遍,没有办法采用新教程中的方法。

其中的

func = relay.Function(...)
mod = relay.Module.from_expr(func) # tvm-0.7 tvm.IRModule.from_expr
print(mod)

可以输出Relay IR生成的计算图(computational graph)/数据流图(data flow graph)。

v0.0.4
def @main(%layers.1.weight: Tensor[(16, 7), float32], %layers.0.weight: Tensor[(1433, 16), float32], %infeats: Tensor[(2708, 1433), float32], %layers.0.bias: Tensor[(16), float32], %layers.1.bias: Tensor[(7), float32]) -> Tensor[(2708, 7), float32] {
  %0 = transpose(%layers.1.weight, axes=None) /* ty=Tensor[(7, 16), float32] */;
  %1 = transpose(%layers.0.weight, axes=None) /* ty=Tensor[(16, 1433), float32] */;
  %2 = multiply(%infeats, meta[relay.Constant][0] /* ty=Tensor[(2708, 1), float32] */ /* ty=Tensor[(2708, 1), float32] */) /* ty=Tensor[(2708, 1433), float32] */;
  %3 = nn.dense(%1, %2, units=None) /* ty=Tensor[(16, 2708), float32] */;
  %4 = nn.sparse_dense(%3, meta[relay.Constant][1] /* ty=Tensor[(13264), float32] */ /* ty=Tensor[(13264), float32] */, meta[relay.Constant][2] /* ty=Tensor[(13264), int32] */ /* ty=Tensor[(13264), int32] */, meta[relay.Constant][3] /* ty=Tensor[(2709), int32] */ /* ty=Tensor[(2709), int32] */) /* ty=Tensor[(16, 2708), float32] */;
  %5 = transpose(%4, axes=None) /* ty=Tensor[(2708, 16), float32] */;
  %6 = multiply(%5, meta[relay.Constant][0] /* ty=Tensor[(2708, 1), float32] */ /* ty=Tensor[(2708, 1), float32] */) /* ty=Tensor[(2708, 16), float32] */;
  %7 = nn.bias_add(%6, %layers.0.bias, axis=-1) /* ty=Tensor[(2708, 16), float32] */;
  %8 = nn.relu(%7) /* ty=Tensor[(2708, 16), float32] */;
  %9 = multiply(%8, meta[relay.Constant][0] /* ty=Tensor[(2708, 1), float32] */ /* ty=Tensor[(2708, 1), float32] */) /* ty=Tensor[(2708, 16), float32] */;
  %10 = nn.dense(%0, %9, units=None) /* ty=Tensor[(7, 2708), float32] */;
  %11 = nn.sparse_dense(%10, meta[relay.Constant][1] /* ty=Tensor[(13264), float32] */ /* ty=Tensor[(13264), float32] */, meta[relay.Constant][2] /* ty=Tensor[(13264), int32] */ /* ty=Tensor[(13264), int32] */, meta[relay.Constant][3] /* ty=Tensor[(2709), int32] */ /* ty=Tensor[(2709), int32] */) /* ty=Tensor[(7, 2708), float32] */;
  %12 = transpose(%11, axes=None) /* ty=Tensor[(2708, 7), float32] */;
  %13 = multiply(%12, meta[relay.Constant][0] /* ty=Tensor[(2708, 1), float32] */ /* ty=Tensor[(2708, 1), float32] */) /* ty=Tensor[(2708, 7), float32] */;
  nn.bias_add(%13, %layers.1.bias, axis=-1) /* ty=Tensor[(2708, 7), float32] */
}

// meta data omitted. you can use show_meta_data=True to include meta data