问题描述排查过程解决办法后续
问题描述在原来的老集群笔者为K8S 1.14.3的环境中测试nfs-csi的功能,发现动态创建pv,是可以跟pvc进行绑定的,但是在pod attach的时候发生报错,describe pod发现报错信息如下:
Feb 16 14:49:36 slave7 kubelet[2523120]: I0216 14:49:36.658610 2523120 reconciler.go:227] operationExecutor.AttachVolume started for volume "pvc-dd48bec6-8ed9-11ec-9a69-fa163e25388c" (UniqueName: "kubernetes.io/csi/nfs.csi.inspur.com^192.168.1.201/pvc-dd48bec6-8ed9-11ec-9a69-fa163e25388c") pod "nexus-5bd6475fc5-zgh9q" (UID: "bfd510da-8ef3-11ec-9a69-fa163e25388c")
Feb 16 14:49:36 slave7 kubelet[2523120]: E0216 14:49:36.660467 2523120 csi_attacher.go:93] kubernetes.io/csi: attacher.Attach failed: volumeattachments.storage.k8s.io is forbidden: User "system:node:slave7" cannot create resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope: can only get individual resources of this type
Feb 16 14:49:36 slave7 kubelet[2523120]: E0216 14:49:36.660524 2523120 nestedpendingoperations.go:267] Operation for ""kubernetes.io/csi/nfs.csi.inspur.com^192.168.1.201/pvc-dd48bec6-8ed9-11ec-9a69-fa163e25388c"" failed. No retries permitted until 2022-02-16 14:51:38.660499769 +0800 CST m=+3307.695433396 (durationBeforeRetry 2m2s). Error: "AttachVolume.Attach failed for volume "pvc-dd48bec6-8ed9-11ec-9a69-fa163e25388c" (UniqueName: "kubernetes.io/csi/nfs.csi.inspur.com^192.168.1.201/pvc-dd48bec6-8ed9-11ec-9a69-fa163e25388c") from node "slave7" : volumeattachments.storage.k8s.io is forbidden: User "system:node:slave7" cannot create resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope: can only get individual resources of this type"
Feb 16 14:49:49 slave7 kubelet[2523120]: W0216 14:49:49.223559 2523120 kubelet_pods.go:849] Unable to retrieve pull secret harbor/service-registry for harbor/harbor-redis-redis4-ha-server-1 due to secret "service-registry" not found. The image pull may not succeed.
Feb 16 14:50:05 slave7 kubelet[2523120]: E0216 14:50:05.223754 2523120 kubelet.go:1665] Unable to mount volumes for pod "nexus-5bd6475fc5-zgh9q_cicd(bfd510da-8ef3-11ec-9a69-fa163e25388c)": timeout expired waiting for volumes to attach or mount for pod "cicd"/"nexus-5bd6475fc5-zgh9q". list of unmounted volumes=[nexus-data nexus-backup]. list of unattached volumes=[nexus-data nexus-backup default-token-nh7xp]; skipping pod
Feb 16 14:50:05 slave7 kubelet[2523120]: E0216 14:50:05.223799 2523120 pod_workers.go:190] Error syncing pod bfd510da-8ef3-11ec-9a69-fa163e25388c ("nexus-5bd6475fc5-zgh9q_cicd(bfd510da-8ef3-11ec-9a69-fa163e25388c)"), skipping: timeout expired waiting for volumes to attach or mount for pod "cicd"/"nexus-5bd6475fc5-zgh9q". list of unmounted volumes=[nexus-data nexus-backup]. list of unattached volumes=[nexus-data nexus-backup default-token-nh7xp]
具体错误为这一句:
kubernetes.io/csi: attacher.Attach failed: volumeattachments.storage.k8s.io is forbidden: User "system:node:slave7" cannot create resource "volumeattachments" in API group "storage.k8s.io" at the cluster scope: can only get individual resources of this type排查过程
- 排查日志发现根本没有走到nfs-csi插件排查kubelet日志发现报错跟上面一致,分析一下日志感觉应该是权限问题排查cluseterrole 发现有一个system:node的clusterrolekubectl get ClusterRole system:node -o yaml 结果太长,只看相关的:
... - apiGroups: - storage.k8s.io resources: - volumeattachments verbs: - get ...
好了,这就找到原因了,system:node 对于 storage.k8s.io group的volumeattachments没有create权限,导致了上面的报错
解决办法解决办法也很简单,改一下这个system:node的clusterrole就可以了
kubectl edit ClusterRole system:node - apiGroups: - storage.k8s.io resources: - volumeattachments verbs: - create - delete - get - patch - update - apiGroups:
然后就搞定了
后续后面排查了K8S 1.20集群,发现这个clusterrole已经改了,所以应该是1.14版本没有适配的问题



