PyCUDA + Threading =内核调用无效的句柄

原因是上下文关联。每个CUDA函数实例都与上下文相关联，并且它们不是可移植的（这适用于内存分配和纹理引用）。因此，每个上下文必须分别加载函数实例，然后使用该加载操作返回的函数句柄。

如果根本不使用元编程，则可能会发现将CUDA代码编译为cubin文件，然后使用来将所需的功能从cubin加载到每个上下文会更简单

driver.module_from_file

。直接从我的一些生产代码中剪切和粘贴：

# Context establishmenttry:    if (autoinit):        import pycuda.autoinit        self.context = None        self.device = pycuda.autoinit.device        self.computecc = self.device.compute_capability()    else:        driver.init()        self.context = tools.make_default_context()        self.device = self.context.get_device()        self.computecc = self.device.compute_capability()    # GPU pre initialization    # load pre compiled CUDA pre from cubin file    # Select the cubin based on the supplied dtype    # cubin names contain C++ mangling because of    # templating. Ugly but no easy way around it    if self.computecc == (1,3):        self.fimcubin = "fim_sm13.cubin"    elif self.computecc[0] == 2:        self.fimcubin = "fim_sm20.cubin"    else:        raise NotImplementedError("GPU architecture not supported")    fimmod = driver.module_from_file(self.fimcubin)    IterateName32 = "_Z10fimIterateIfLj8EEvPKT_PKiPS0_PiS0_S0_S0_jjji"    IterateName64 = "_Z10fimIterateIdLj8EEvPKT_PKiPS0_PiS0_S0_S0_jjji"    if (self.dtype == np.float32):        IterateName = IterateName32    elif (self.dtype == np.float64):        IterateName = IterateName64    else:        raise TypeError    self.fimIterate = fimmod.get_function(IterateName)except importError:    warn("Could not initialise CUDA context")

PyCUDA + Threading =内核调用无效的句柄

面试问答相关栏目本月热门文章