Skip to content

[BUG] AKS: AzureML Extension installation webhook call 10 seconds Timeout error #297

@Mykp3

Description

@Mykp3

Describe the bug
First attempt to install azureml extension version 1.1.35 is failing due to following timeout error:

Talled calling webnook "Valldate.ginx. ingress. Kuberheues. 10 Talled to
caLL wernook. rosT
"httos://azureml-inaress-nginx-controller-admission.azureml.svc:443/networkina/v1/ingresses?timeout=10s:
no endpoints available for service
"azurem-ingress-nginx-controller-admission"]ll occurred while doina the operation: Icreater on tne contig,

To Reproduce
Steps to reproduce the behaviour: AKS cluster that is actually populated with workloads (not blank) and attempt extension installation with SSL Enabled:

resource "null_resource" "test" {
  provisioner "local-exec" {
    command = " az k8s-extension create  --name ml-extension --auto-upgrade-minor-version false --extension-type Microsoft.AzureML.Kubernetes  --config enableTraining=True enableInference=True inferenceRouterServiceType=LoadBalancer allowInsecureConnections=False  --config-protected sslCertPemFile=/tmp/tls.pem sslKeyPemFile=/tmp/key.pem sslCname=${var.aml_settings.dns_record_name}.${var.aml_settings.dns_zone_name} --cluster-type managedClusters  --cluster-name ${var.cluster_name} --resource-group ${var.resource_group_name} --version 1.1.35”
  }
}

Expected behavior
A webhook call "Valldate.ginx. ingress." has a healthy 30 seconds timeout and some retries as we used to have in HTTPS REST world.

Screenshots
Log message, equivalent of the screenshot

Message: The extension operation failed with the following error:
Error: [
InnerError: [Helm installation failed: Unable to create/update Kubernetes resources for the extension: Recommendation Please check that there are no policies blocking the resource creation/update for the extension :
InnerError [release ml-extension failed, and has been uninstalled due to atomic being set: failed to create resource: Internal error occurred:
Talled calling webnook "Valldate.ginx. ingress. Kuberheues. 10 Talled to
caLL wernook. rosT
"httos://azureml-inaress-nginx-controller-admission.azureml.svc:443/networkina/v1/ingresses?timeout=10s:
no endpoints available for service
"azurem-ingress-nginx-controller-admission"]ll occurred while doina the operation: Icreater on tne contig,

Environment (please complete the following information):

  • Kubernetes versions 1.27, 1.28, 1.28.5 and more
    azure-cli 2.61.0 *

core 2.61.0 *
telemetry 1.1.0

Extensions:
k8s-extension 1.4.5
ml 2.22.0

Additional context
Second Installation usually succeeds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions