cuDF是单个GPU库。2000 MB的文件20 MB大约是40 GB的数据,这比单个V100 GPU的内存容量还大。
对于需要更多单个GPU的工作流,cuDF依赖Dask。以下示例说明了如何使用cuDF +
Dask将数据读取到单个节点中具有多个GPU的分布式GPU内存中。这不会回答您的调试问题,但有望解决您的问题。
首先,我使用几行代码来创建由两个GPU组成的Dask集群。
from dask.distributed import Clientfrom dask_cuda import LocalCUDAClusterimport dask_cudfcluster = LocalCUDACluster() # by default use all GPUs in the node. I have two.client = Client(cluster)client# The print output of client:# # Client# Scheduler: tcp://127.0.0.1:44764# Dashboard: http://127.0.0.1:8787/status# Cluster# Workers: 2# Cores: 2# Memory: 404.27 GB
接下来,我将为该示例创建几个镶木地板文件。
import osimport cudffrom cudf.datasets import randomdataif not os.path.exists('example_output'): os.mkdir('example_output')for x in range(2): df = randomdata(nrows=10000, dtypes={'a':int, 'b':str, 'c':str, 'd':int}, seed=12) df.to_parquet('example_output/df')让我们用来查看我每个GPU上的内存
nvidia-smi。
nvidia-smiThu Sep 26 19:13:46 2019 +-----------------------------------------------------------------------------+| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. ||===============================+======================+======================|| 0 Tesla T4 On | 00000000:AF:00.0 Off | 0 || N/A 51C P0 29W / 70W | 6836MiB / 15079MiB | 0% Default |+-------------------------------+----------------------+----------------------+| 1 Tesla T4 On | 00000000:D8:00.0 Off | 0 || N/A 47C P0 28W / 70W | 5750MiB / 15079MiB | 0% Default |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes:GPU Memory || GPU PID Type Process name Usage ||=============================================================================|+-----------------------------------------------------------------------------+
注意这两个值。GPU 0上为6836 MB,GPU 1上为5750 MB(我碰巧在这些GPU上的内存中已经有无关的数据)。现在,让我们使用Dask
cuDF阅读两个镶木地板文件的整个目录,然后再读取
persist它。持续执行会强制进行计算-
达斯执行是懒惰的,因此仅调用
read_parquet只会向任务图中添加一个任务。
ddf是Dask Dataframe。
ddf = dask_cudf.read_parquet('example_output/df')ddf = ddf.persist()现在让我们
nvidia-smi再看一看。
Thu Sep 26 19:13:52 2019 +-----------------------------------------------------------------------------+| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. ||===============================+======================+======================|| 0 Tesla T4 On | 00000000:AF:00.0 Off | 0 || N/A 51C P0 29W / 70W | 6938MiB / 15079MiB | 2% Default |+-------------------------------+----------------------+----------------------+| 1 Tesla T4 On | 00000000:D8:00.0 Off | 0 || N/A 47C P0 28W / 70W | 5852MiB / 15079MiB | 2% Default |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes:GPU Memory || GPU PID Type Process name Usage ||=============================================================================|+-----------------------------------------------------------------------------+
Dask可以为我们在两个GPU之间分配数据。



