栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 系统运维 > 运维 > Linux

本地安装nvidia-docker问题记录

Linux 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

本地安装nvidia-docker问题记录

参考NVIDIA的教程.
一.

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0018] error waiting for container: context canceled 

这里先尝试了github issue中的解决方案1,将 sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit拆分成

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

还是没有效果,这里因为在执行sudo apt-get update时老是报deepin的错误,所以接下来尝试卸载deepin, 这里尝试了很多种方式, 在进行如下两种操作后, sudo apt-get update不在显示和wine相关的错误.
 1. 使用命令sudo apt remove deepin* 卸载掉安装的wine软件
 2. 使用命令 find / -name '*wine*' 在磁盘中找到wine的剩余文件,然后删除.

(base) wlj@wlj-OUC:~$ sudo find / -name '*wine*'
find: ‘/run/user/1000/gvfs’: Permission denied
/etc/apt/sources.list.d/deepin-wine.i-m.dev.list.save
/etc/apt/sources.list.d/deepin-wine.i-m.dev.list
/etc/apt/preferences.d/deepin-wine.i-m.dev.pref

之后在执行 sudo apt-get update 就可以了.然后又按照上面NVIDIA的教程重新走了一遍, 进行测试就没有问题了.

sudo docker run --gpus 2 nvidia/cuda:10.0-base nvidia-smi
Wed May  4 02:11:20 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:01:00.0 Off |                  N/A |
| 29%   61C    P0    45W / 175W |      0MiB /  7952MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 2070    Off  | 00000000:03:00.0 Off |                  N/A |
| 38%   41C    P0     1W / 175W |      0MiB /  7952MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

但是此时执行nvidia-docker仍然报错.
二.

(base) wlj@wlj-OUC:~$ nvidia-docker
nvidia-docker: command not found

然后发现上面那个链接指导并不全面, 找到了NVIDIA-docker 的官方git, 在其中一个安装指导链接, 这里我就只执行了如下三个命令就可以使用NVIDIA-docker了.
 1.

curl https://get.docker.com | sh 
  && sudo systemctl --now enable docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) 
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg 
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | 
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | 
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get install -y nvidia-docker2

到此处, nvidia-docker 就可以使用了.
 4. 当我在服务器上安装时,在第二的大问题上遇到了本地没有的错误,当我执行完二.1和 二.2的命令后,我又更新了一下,结果报错如下:

(base) ouc@ouc-Super-Server:~$ sudo apt-get update
E: Conflicting values set for option Signed-By regarding source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64/ /: /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg != 
E: The list of sources could not be read.

这里参考这里的解决方案,删除了一些文件

(base) ouc@ouc-Super-Server:/etc/apt/sources.list.d$ ls
mysql.list                     rvm-ubuntu-smplayer-bionic.list       vscode.list
mysql.list.save                rvm-ubuntu-smplayer-bionic.list.save  vscode.list.save
nvidia-container-toolkit.list  teamviewer.list
nvidia-docker.list             teamviewer.list.save
(base) ouc@ouc-Super-Server:/etc/apt/sources.list.d$ sudo rm nvidia-*

然后我又从一开始走了一遍,到第二步的时候,我直接安装了NVIDIA-docker2,就成功了,奇奇怪怪。

(base) ouc@ouc-Super-Server:/etc/apt/sources.list.d$ sudo apt-get install -y nvidia-docker2
Reading package lists... Done
Building dependency tree       
Reading state information... Done
(Reading database ... 248522 files and directories currently installed.)
Preparing to unpack .../nvidia-docker2_2.10.0-1_all.deb ...
Unpacking nvidia-docker2 (2.10.0-1) ...
Setting up nvidia-docker2 (2.10.0-1) ...

ok,服务器上的NVIDIA-docker也安装好了。

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/860209.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号