Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSINode does not contain driver csi.hpe.com #70

Open
Gusymochis opened this issue Feb 18, 2025 · 17 comments
Open

CSINode does not contain driver csi.hpe.com #70

Gusymochis opened this issue Feb 18, 2025 · 17 comments

Comments

@Gusymochis
Copy link

After installing and configuring the truenas-csp I'm getting errors when attaching PV to pods, the error seems to be CSINode does not contain driver csi.hpe.com. I have checked for errors in the CSI driver but I'm not able to see any, the attacher does provide this error:

I0218 01:02:41.688579       1 controller.go:213] "Started VolumeAttachment processing" driver="csi.hpe.com" VolumeAttachment="csi-84dd4f19694a637c375fbfa835797af7a1e878599483245b8d4864f15bda9bfb"
I0218 01:02:41.688613       1 csi_handler.go:233] "CSIHandler: processing VolumeAttachment" driver="csi.hpe.com" VolumeAttachment="csi-84dd4f19694a637c375fbfa835797af7a1e878599483245b8d4864f15bda9bfb"
I0218 01:02:41.688622       1 csi_handler.go:261] "Attaching" driver="csi.hpe.com" VolumeAttachment="csi-84dd4f19694a637c375fbfa835797af7a1e878599483245b8d4864f15bda9bfb"
I0218 01:02:41.688629       1 csi_handler.go:434] "Starting attach operation" driver="csi.hpe.com" VolumeAttachment="csi-84dd4f19694a637c375fbfa835797af7a1e878599483245b8d4864f15bda9bfb"
I0218 01:02:41.688663       1 csi_handler.go:347] "PersistentVolume finalizer is already set" driver="csi.hpe.com" VolumeAttachment="csi-84dd4f19694a637c375fbfa835797af7a1e878599483245b8d4864f15bda9bfb" PersistentVolume="pvc-3c76972d-cd47-4adb-9a2c-b28e7e941324"
I0218 01:02:41.692376       1 csi_handler.go:768] "Failed to get nodeID from CSINode" driver="csi.hpe.com" VolumeAttachment="csi-84dd4f19694a637c375fbfa835797af7a1e878599483245b8d4864f15bda9bfb" nodeName="<node>" err="CSINode <node> does not contain driver csi.hpe.com"
I0218 01:02:41.692398       1 csi_handler.go:606] "Saving attach error" driver="csi.hpe.com" VolumeAttachment="csi-84dd4f19694a637c375fbfa835797af7a1e878599483245b8d4864f15bda9bfb"
I0218 01:02:41.697361       1 csi_handler.go:617] "Saved attach error" driver="csi.hpe.com" VolumeAttachment="csi-84dd4f19694a637c375fbfa835797af7a1e878599483245b8d4864f15bda9bfb"
I0218 01:02:41.697507       1 csi_handler.go:243] "Error processing" driver="csi.hpe.com" VolumeAttachment="csi-84dd4f19694a637c375fbfa835797af7a1e878599483245b8d4864f15bda9bfb" err="failed to attach: CSINode <node> does not contain driver csi.hpe.com"
I0218 01:02:41.697584       1 controller.go:167] "Ignoring VolumeAttachment change" driver="csi.hpe.com" VolumeAttachment="csi-84dd4f19694a637c375fbfa835797af7a1e878599483245b8d4864f15bda9bfb"

The CSI driver init pod provides this output:

Node conformance checks are disabled
Node configuration is disabled
time="2025-02-18T00:58:34Z" level=info msg="Initialized logging." alsoLogToStderr=true logFileLocation=/var/log/hpe-csi-controller.log logLevel=info
time="2025-02-18T00:58:34Z" level=info msg="**********************************************" file="csi-driver.go:56"
time="2025-02-18T00:58:34Z" level=info msg="*************** HPE CSI DRIVER ***************" file="csi-driver.go:57"
time="2025-02-18T00:58:34Z" level=info msg="**********************************************" file="csi-driver.go:58"
time="2025-02-18T00:58:34Z" level=info msg=">>>>> CMDLINE Exec, args: []" file="csi-driver.go:60"
time="2025-02-18T00:58:34Z" level=info msg="Skipping node configuration, DISABLE_NODE_CONFIGURATION=true. All block storage services needs to be installed and configured manually." file="csi-driver.go:132"
time="2025-02-18T00:58:34Z" level=info msg=">>>>> node init container " file="nodeinit.go:38"
time="2025-02-18T00:58:34Z" level=info msg="Found 0 multipath devices []" file="multipath.go:423"
time="2025-02-18T00:58:34Z" level=info msg="No multipath devices found on this node ." file="utils.go:45"

the hpe-csi-driver pod shows the following many times:

time="2025-02-18T01:00:06Z" level=info msg="Node monitor started monitoring the node <node>" file="nodemonitor.go:101"
time="2025-02-18T01:00:06Z" level=info msg="Found 0 multipath devices []" file="multipath.go:423"
time="2025-02-18T01:00:06Z" level=info msg="No multipath devices found on this node <node>." file="utils.go:45"

the csi-node-driver-registrar shows the following

I0218 00:58:36.870491       1 main.go:164] Calling CSI driver to discover driver name
I0218 00:58:36.870519       1 connection.go:244] GRPC call: /csi.v1.Identity/GetPluginInfo
I0218 00:58:36.870526       1 connection.go:245] GRPC request: {}
I0218 00:58:36.874824       1 connection.go:251] GRPC response: {"name":"csi.hpe.com","vendor_version":"1.3"}
I0218 00:58:36.874841       1 connection.go:252] GRPC error: <nil>
I0218 00:58:36.874852       1 main.go:173] CSI driver name: "csi.hpe.com"
I0218 00:58:36.874890       1 node_register.go:55] Starting Registration Server at: /registration/csi.hpe.com-reg.sock
I0218 00:58:36.875129       1 node_register.go:64] Registration Server started at: /registration/csi.hpe.com-reg.sock
I0218 00:58:36.875201       1 node_register.go:88] Skipping HTTP server because endpoint is set to: ""

The node is running Ubuntu 24.04
I have verified all the required packages for iscsi, multipath, xfs and nfsv4. I have also tried enabling nodeConformance and configuration but nothing has changed, I think the issue may be related to the 0 multipath devices info log.

@datamattsson
Copy link
Collaborator

Strange. What does kubectl get csinodes -o yaml say?

@Gusymochis
Copy link
Author

yes, this is the output

apiVersion: v1
items:
- apiVersion: storage.k8s.io/v1
  kind: CSINode
  metadata:
    annotations:
      storage.alpha.kubernetes.io/migrated-plugins: kubernetes.io/aws-ebs,kubernetes.io/azure-disk,kubernetes.io/azure-file,kubernetes.io/cinder,kubernetes.io/gce-pd,kubernetes.io/portworx-volume,kubernetes.io/vsphere-volume
    creationTimestamp: "2024-01-16T02:56:42Z"
    name: <node>
    ownerReferences:
    - apiVersion: v1
      kind: Node
      name: <node>
      uid: 601b1edc-1be8-4054-a5bb-997bdea8a730
    resourceVersion: "29281389"
    uid: 95088258-7fae-4085-be2e-fef6d1f0d3ed
  spec:
    drivers:
    - name: nfs.csi.k8s.io
      nodeID: <node>
      topologyKeys: null
kind: List
metadata:
  resourceVersion: ""

@datamattsson
Copy link
Collaborator

Hmm, no node driver. What does kubectl get pods -o wide -n hpe-storage say?

@Gusymochis
Copy link
Author

NAME                                 READY   STATUS    RESTARTS   AGE   IP             NODE     NOMINATED NODE   READINESS GATES
hpe-csi-controller-d59876d87-ctqqx   9/9     Running   0          16h   10.0.8.2       <node>   <none>           <none>
hpe-csi-node-ngwmf                   2/2     Running   0          16h   10.0.8.2       <node>   <none>           <none>
truenas-csp-697759dbb5-j8qzj         1/1     Running   0          16h   10.244.0.171   <node>   <none>           <none>

All pods seem to be running with no issues.

@datamattsson
Copy link
Collaborator

Are you using any kind of exotic Kubernetes distribution or host OS? What if you restart the the CSI node driver DaemonSet, will csinodes populate?

@Gusymochis
Copy link
Author

I'm running a Vanilla Ubuntu sever, with K0S as the Kubernetes distribution. The issue though looks very similar to an issue described here:
democratic-csi/democratic-csi#86
where the open-iscsi service is not running due too:

├─ ConditionDirectoryNotEmpty=|/etc/iscsi/nodes was not met
└─ ConditionDirectoryNotEmpty=|/sys/class/iscsi_session was not met

@datamattsson
Copy link
Collaborator

K0s uses a non-standard path for the Kubelet: https://docs.k0sproject.io/v1.31.3+k0s.0/storage/#installing-3rd-party-storage-solutions

You need to tell the HPE CSI Driver chart this. Install the TrueNAS CSP chart with --set hpe-csi-driver.kubeletRootDir=/varlib/k0s/kubelet.

There are most likely other challenges that might crop up, but start there.

@Gusymochis
Copy link
Author

Gusymochis commented Feb 20, 2025

Excelent!
I can see the csi-driver now
now I seem to be facing another issue:
Pods are pending attacher with the following error:

MountVolume.MountDevice failed for volume "pvc-036a92cf-7169-43b1-8d0e-601cd32b613b" : rpc error: code = Internal desc = Failed to stage volume Default_Apps_iSCSI_pvc-036a92cf-7169-43b1-8d0e-601cd32b613b, err: rpc error: code = Internal desc = Error creating device for volume Default_Apps_iSCSI_pvc-036a92cf-7169-43b1-8d0e-601cd32b613b, err: device not found with serial 6589cfc0000004fe485ed74e93e57fc8 or target

the hpe.csi.driver pod shows this

time="2025-02-20T01:21:51Z" level=error msg="command iscsiadm failed with rc=21 err=iscsiadm: No portals found\n" file="iscsi.go:764"
time="2025-02-20T01:21:51Z" level=error msg="\n Error in GetSecondaryBackends unexpected end of JSON input" file="volume.go:87"
time="2025-02-20T01:21:51Z" level=error msg="\n Passed details " file="volume.go:88"
time="2025-02-20T01:21:51Z" level=error msg="\n Error in GetSecondaryBackends unexpected end of JSON input" file="volume.go:87"
time="2025-02-20T01:21:51Z" level=error msg="\n Passed details " file="volume.go:88"
time="2025-02-20T01:21:52Z" level=error msg="\n Error in GetSecondaryArrayLUNIds unexpected end of JSON input" file="volume.go:29"
time="2025-02-20T01:21:53Z" level=error msg="\n Error in GetSecondaryArrayLUNIds unexpected end of JSON input" file="volume.go:29"
time="2025-02-20T01:21:58Z" level=error msg="\n Error in GetSecondaryArrayLUNIds unexpected end of JSON input" file="volume.go:29"
time="2025-02-20T01:21:58Z" level=error msg="\n Error in GetSecondaryArrayLUNIds unexpected end of JSON input" file="volume.go:29"
time="2025-02-20T01:22:03Z" level=error msg="\n Error in GetSecondaryArrayLUNIds unexpected end of JSON input" file="volume.go:29"
time="2025-02-20T01:22:03Z" level=error msg="\n Error in GetSecondaryArrayLUNIds unexpected end of JSON input" file="volume.go:29"
time="2025-02-20T01:22:08Z" level=error msg="\n Error in GetSecondaryArrayLUNIds unexpected end of JSON input" file="volume.go:29"
time="2025-02-20T01:22:08Z" level=error msg="\n Error in GetSecondaryArrayLUNIds unexpected end of JSON input" file="volume.go:29"
time="2025-02-20T01:22:13Z" level=error msg="\n Error in GetSecondaryArrayLUNIds unexpected end of JSON input" file="volume.go:29"
time="2025-02-20T01:22:13Z" level=error msg="\n Error in GetSecondaryArrayLUNIds unexpected end of JSON input" file="volume.go:29"
time="2025-02-20T01:22:13Z" level=info msg="Node monitor started monitoring the node <node>" file="nodemonitor.go:101"
time="2025-02-20T01:22:13Z" level=info msg="Found 0 multipath devices []" file="multipath.go:423"
time="2025-02-20T01:22:13Z" level=info msg="No multipath devices found on this node <node>." file="utils.go:45"
time="2025-02-20T01:22:18Z" level=error msg="\n Error in GetSecondaryArrayLUNIds unexpected end of JSON input" file="volume.go:29"
time="2025-02-20T01:22:18Z" level=error msg="\n Error in GetSecondaryArrayLUNIds unexpected end of JSON input" file="volume.go:29"
time="2025-02-20T01:22:23Z" level=error msg="process with pid : 861 finished with error = exit status 21" file="cmd.go:63"

@datamattsson
Copy link
Collaborator

iscsiadm: No portals found is suspicious, have you setup your TrueNAS correctly? https://github.com/hpe-storage/truenas-csp/blob/master/INSTALL.md#configure-truenasfreenas

@Gusymochis
Copy link
Author

The portal looks OK

Image
also the node is able to reach the iscsi server

nc -zv 10.0.0.7 3260
Connection to 10.0.0.7 3260 port [tcp/iscsi-target] succeeded!

@datamattsson
Copy link
Collaborator

The error that bubbles up to the PVC events is the outcome of the CSI node driver is unable to find the multipath device on the node. This can be a number of things. Can you manually create a volume and manually expose the target to the node and discover the device to ensure the data path is sane?

@Gusymochis
Copy link
Author

Image
doing a discover show

sudo iscsiadm -m discoverydb -t st -p 10.0.0.7:3260 --discover
10.0.0.7:3260,1 iqn.2011-08.org.truenas.ctl:test

@datamattsson
Copy link
Collaborator

Are you certain you're not using DHCP addresses on the TrueNAS target?

@datamattsson
Copy link
Collaborator

Is this: 10.0.0.7:3260,1 iqn.2011-08.org.truenas.ctl:test appearing on the host as a multipath device? multipath -ll

@Gusymochis
Copy link
Author

yes

mpathb (36589cfc0000004d6df45aab20d882925) dm-5 TrueNAS,iSCSI Disk
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 0:0:0:0 sda 8:0 active ready running

@Gusymochis
Copy link
Author

Found the Issue!

The main issue seems to be that the default authorized networks is set to whatever the CIDR of the Truenas Server is, for my use case my K8S Node is in a different VLAN which didn't match the authorized networks. After manually changing the authorized networks to the correct one everything mounted as expected, I noticed also in the installation instructions that authorized networks is added as a parameter for the storage-class. After adding it I noticed that nothing changed, looking at the logs for the CSP I noticed it was failing to validate the networks added

Mon, 24 Feb 2025 07:11:35 +0000 backend ERROR Exception: Traceback (most recent call last):
File "/app/backend.py", line 567, in create_target
req_backend['auth_networks'] = self.auth_networks_validate(custom_networks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/backend.py", line 728, in auth_networks_validate
if ipaddress.ip_network(cidr):
^^^^^^^^^
NameError: name 'ipaddress' is not defined. Did you forget to import 'ipaddress'

I submitted a PR which should solve this here: #71

@datamattsson
Copy link
Collaborator

Found the Issue!

Thanks for reporting this. Routable data networks is not something that is well tested. I'll have a look at this for the next version, which should be soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants