导语:在windows10的docker中启动非GPU应用容器会报错libnvidia-ml.so.1: file exists: unknown。但是这个镜像在其他linux环境可以跑。并且在wsl中运行mysql和zk容器是没问题。
以下是2个镜像的history。
10.10.3.5/cta/java-egl:01
IMAGE CREATED CREATED BY SIZE COMMENT sha256:cf0a449929dbbeadd303ac4e4d9558bbb8ed35705ee0698b590c199f9da20c73 8 weeks ago /bin/sh -c #(nop) ENV PATH=/opt/jdk1.8.0_151/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 0B8 weeks ago /bin/sh -c #(nop) ENV CLASSPATH=.:/opt/jdk1.8.0_151/lib:/opt/jdk1.8.0_151/jre/lib 0B 8 weeks ago /bin/sh -c #(nop) ENV JRE_HOME=/opt/jdk1.8.0_151/jre 0B 8 weeks ago /bin/sh -c #(nop) ENV JAVA_HOME=/opt/jdk1.8.0_151 0B 8 weeks ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B 8 weeks ago /bin/sh -c #(nop) ENV LC_ALL=C.UTF-8 0B 8 weeks ago /bin/sh -c rm -rf /etc/localtime && ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && rm -rf /var/lib/apt/listsexit 0/' /sbin/initctl && echo 'force-unsafe-io' > /etc/dpkg/dpkg.cfg.d/docker-apt-speedup && echo 'DPkg::Post-Invoke { "rm -f /var/cache/apt/archivesexit 0/' /sbin/initctl && echo 'force-unsafe-io' > /etc/dpkg/dpkg.cfg.d/docker-apt-speedup && echo 'DPkg::Post-Invoke { "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true"; };' > /etc/apt/apt.conf.d/docker-clean && echo 'APT::Update::Post-Invoke { "rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true"; };' >> /etc/apt/apt.conf.d/docker-clean && echo 'Dir::Cache::pkgcache ""; Dir::Cache::srcpkgcache "";' >> /etc/apt/apt.conf.d/docker-clean && echo 'Acquire::Languages "none";' > /etc/apt/apt.conf.d/docker-no-languages && echo 'Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";' > /etc/apt/apt.conf.d/docker-gzip-indexes && echo 'Apt::AutoRemove::SuggestsImportant "false";' > /etc/apt/apt.conf.d/docker-autoremove-suggests 745B 8 months ago /bin/sh -c #(nop) ADD file:11b425d4c08e81a3e0cb2e0345d27cd5fc844dd83f1096af4cc05f635824ff5d in / 135MB
通过命令docker history image --no-trunc对比发现2个镜像的构建的命令和文件都是相同的。
egl-test:0508是我在一台没有显卡且没有安装显卡驱动的机子上重新制作的镜像。
10.10.3.5/cta/java-egl:01 是我很早之前做的镜像。 2个镜像的Dockerfile是同一个,但是对应的显卡驱动却不一样。
把Dockerfile文件夹拷贝到一台已经安装了显卡驱动的机子上测试。离谱 居然复现了这个问题
测试使用–no-cache 构建镜像也是一样的问题。
怀疑是不是docker的问题 。修改daemon.json
原daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-shm-size": "2G",
"insecure-registries": ["harbor.deepwise.com","10.10.3.5","172.28.3.5"] ,
"graph":"/data1/docker/lib/docker"
}
修改后
{
"graph":"/data1/docker/lib/docker"
}
解决。刚刚还会出现的nvidia相关信息消失了。
查阅资料 也有人有类似问题
可参考对方的不从头构建镜像的解决办法,需要修改daemon.json,否则修改的时候会报device or resource busy。
FROMRUN umount /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 && rm -rf /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 ...
我验证的是这个方式
FROMRUN rm -rf /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so.1
参考issue
https://github.com/NVIDIA/nvidia-docker/issues/1551


![[问题已处理]-报错libnvidia-ml.so.1- file exists- unknown [问题已处理]-报错libnvidia-ml.so.1- file exists- unknown](http://www.mshxw.com/aiimages/31/887932.png)
