本文记录HeteroCL在Python3及LLVM9.0环境下的安装过程。
由于之前安装TVM使用的是LLVM 9.0,而HeteroCL在这一版本下只支持LLVM6.0。直接通过官方教程make
编译会产生错误,下面是出现的两个问题,以及相应的解决方案。
src/codegen/llvm/codegen_llvm.cc: In member function ‘virtual void TVM::codegen::CodeGenLLVM::VisitStmt_(const Halide::Internal::Print*)’:
src/codegen/llvm/codegen_llvm.cc:1423:110: error: no matching function for call to ‘cast<llvm::Function>(llvm::FunctionCallee)’
llvm::Function* printf_call = llvm::cast<llvm::Function>(module_->getOrInsertFunction("printf", call_ftype));
include/llvm/Support/Casting.h:256:44: note: conversion of argument 1 would be ill-formed:
src/codegen/llvm/codegen_llvm.cc:1423:88: error: cannot bind non-const lvalue reference of type ‘llvm::FunctionCallee& ’ to an rvalue of type ‘llvm::FunctionCallee’
llvm::Function* printf_call = llvm::cast<llvm::Function>(module_->getOrInsertFunction("printf", call_ftype));
clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/include/llvm/Target/TargetMachine.h:289:16: note: candidate expects 6 arguments, 3 provided
src/codegen/llvm/llvm_module.cc:89:32: error: invalid initialization of reference of type ‘const llvm::Module&’ from expression of type ‘llvm::Module*’
llvm::WriteBitcodeToFile(mptr_, dest);
对于v0.3版本的支持,如果在电脑里安装了不同的TVM,则可能会报下面的错,参见此issue
: CommandLine Error: Option 'xcore-max-threads' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
这时可以考虑将hlib/python/hlib/__init__.py
里面第二行
from . import frontend
移除,因为frontend
会用到Relay,依赖于第三方TVM库。参见此commit。
修完bug之后可以改变Makefile.config
里的配置(主要是LLVM和CMake的路径),然后将Makefile
内的python
改为python3
,并去除build-pkgs
的requirement,然后就可以通过以下指令编译安装了。(这里没有用官方的python setup.py install
,而采用了类似TVM的路径安装法。)
make build-tvm -j
make build-hcl -j
export HCL_HOME=/path/to/heterocl
export PYTHONPATH=$HCL_HOME/python:$HCL_HOME/hlib/python:${PYTHONPATH}
最后两行环境变量的设置可放在~/.bashrc
内,这样每次开启终端就不用重新source
了。
可运行Getting Started上的测试代码。
import numpy as np
import heterocl as hcl
def simple_compute(a, A):
B = hcl.compute(A.shape, lambda x, y: A[x, y] + a, "B")
"""
The above API is equivalent to the following Python code.
for x in range(0, 10):
for y in range(0, 10):
B[x, y] = A[x, y] + a
"""
return B
hcl.init()
a = hcl.placeholder((), "a")
A = hcl.placeholder((10, 10), "A")
s = hcl.create_schedule([a, A], simple_compute)
print(hcl.lower(s))
f = hcl.build(s)
hcl_a = 10
np_A = np.random.randint(100, size = A.shape)
hcl_A = hcl.asarray(np_A)
hcl_B = hcl.asarray(np.zeros(A.shape))
f(hcl_a, hcl_A, hcl_B)
np_A = hcl_A.asnumpy()
np_B = hcl_B.asnumpy()
assert np.array_equal(np_B, np_A + 10)
print(hcl_a)
print(np_A)
print(np_B)
由于HeteroCL采用Python和C++混编,并采用了TVM的PackedFunc的方法进行函数互相调用,因此调试起来可能会有一定难度。
下面给出了利用gdb定位SegFault的方法。
gdb python
(gdb) run /path/to/script.py
## wait for segfault ##
(gdb) backtrace
## stack trace of the c code