原因是上下文关联。每个CUDA函数实例都与上下文相关联,并且它们不是可移植的(这适用于内存分配和纹理引用)。因此,每个上下文必须分别加载函数实例,然后使用该加载操作返回的函数句柄。
如果根本不使用元编程,则可能会发现将CUDA代码编译为cubin文件,然后使用来将所需的功能从cubin加载到每个上下文会更简单
driver.module_from_file。直接从我的一些生产代码中剪切和粘贴:
# Context establishmenttry: if (autoinit): import pycuda.autoinit self.context = None self.device = pycuda.autoinit.device self.computecc = self.device.compute_capability() else: driver.init() self.context = tools.make_default_context() self.device = self.context.get_device() self.computecc = self.device.compute_capability() # GPU pre initialization # load pre compiled CUDA pre from cubin file # Select the cubin based on the supplied dtype # cubin names contain C++ mangling because of # templating. Ugly but no easy way around it if self.computecc == (1,3): self.fimcubin = "fim_sm13.cubin" elif self.computecc[0] == 2: self.fimcubin = "fim_sm20.cubin" else: raise NotImplementedError("GPU architecture not supported") fimmod = driver.module_from_file(self.fimcubin) IterateName32 = "_Z10fimIterateIfLj8EEvPKT_PKiPS0_PiS0_S0_S0_jjji" IterateName64 = "_Z10fimIterateIdLj8EEvPKT_PKiPS0_PiS0_S0_S0_jjji" if (self.dtype == np.float32): IterateName = IterateName32 elif (self.dtype == np.float64): IterateName = IterateName64 else: raise TypeError self.fimIterate = fimmod.get_function(IterateName)except importError: warn("Could not initialise CUDA context")


