Unable to auto-scale Kubernetes cluster

Hi!

I am unable to auto-scale Kubernetes clusters. As I understand, it create a "cluster-autoscaler" deployment that decides whether to scale or not. However, it does not seem to work, since it logs multiple errors and warnings in the pod, even though it is a completely clean cluster.

Normal scaling seems to work just fine.

# Setup
A "default" CloudStack setup **4.18** running KVMs.

## Settings (relevant)
- Cloud kubernetes service enabled **true**
- Cloud kubernetes cluster experimental features enabled **true**
- Cloud kubernetes cluster max size **50**

The nodes uses the following service offering:
- 2 CPU x 2.05 Ghz
- 2048 MB memory
- 8 GB root disk

# Replicate
1.  Create a new cluster using Kubernets 1.24 ISO found here:
http://download.cloudstack.org/cks/

2. Enable forced auto-scaling
Since the cluster starts with only one worker node, auto-scaling with 3-5 nodes should trigger an upscale (I assume) 
![Screenshot from 2023-08-07 16-55-00](https://github.com/apache/cloudstack-kubernetes-provider/assets/26722370/9b92cc88-b107-42cb-8268-4ae3af25c1f6)

4. Check the logs for cluster-autoscaler in the Kubernetes cluster
Some notable entries:
```
E0807 14:41:30.317148       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:serviceaccount:kube-system:cluster-autoscaler" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope

E0807 14:41:32.388828       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: csistoragecapacities.storage.k8s.io is forbidden: User "system:serviceaccount:kube-system:cluster-autoscaler" cannot list resource "csistoragecapacities" in API group "storage.k8s.io" at the cluster scope
```
Even though I have not edited anything myself (just a clean CKS cluster), I get these weird logs:

```
W0807 14:41:43.251280       1 clusterstate.go:590] Failed to get nodegroup for 6a4c91a3-9694-4596-9ddd-dc86e60136ff: Unable to find node 6a4c91a3-9694-4596-9ddd-dc86e60136ff in cluster

W0807 14:41:43.251361       1 clusterstate.go:590] Failed to get nodegroup for bd0b855f-6dc6-4678-9bea-b52329333024: Unable to find node bd0b855f-6dc6-4678-9bea-b52329333024 in cluster

I0807 14:57:06.667061       1 static_autoscaler.go:341] 2 unregistered nodes present
```
The IDs are correct in CloudStack

The entire log:
[logs-from-cluster-autoscaler-in-cluster-autoscaler-5bf887ddd8-hxg2g.log](https://github.com/apache/cloudstack-kubernetes-provider/files/12281530/logs-from-cluster-autoscaler-in-cluster-autoscaler-5bf887ddd8-hxg2g.log)

Please tell me if you need more logs to look at, or if I should try some other configuration.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to auto-scale Kubernetes cluster #52

Setup

Settings (relevant)

Replicate

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to auto-scale Kubernetes cluster #52

Description

Setup

Settings (relevant)

Replicate

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions