Skip to content

Conversation

@jshiwamV
Copy link
Collaborator

@jshiwamV jshiwamV commented Nov 23, 2025

  • terraform destroy fails with local-exec error #102: was not able to reproduce the issue mentioned in here, @jubrad . Although I have broken down one liner command, into a set of commands to prepare kube_config file properly before getting node-claims.
  • Will raise more PRs to properly document Pre-requisites.

@jshiwamV jshiwamV requested a review from jubrad November 23, 2025 17:53
@jshiwamV jshiwamV marked this pull request as ready for review November 24, 2025 16:12
Copy link
Contributor

@alex-hunt-materialize alex-hunt-materialize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much point in writing the file. How does this fix things?

We should document that they need kubectl, though.

kubeconfig_file=$(mktemp)
echo "$${KUBECONFIG_DATA}" > "$${kubeconfig_file}"
trap "rm -f $${kubeconfig_file}" EXIT
Copy link
Contributor

@alex-hunt-materialize alex-hunt-materialize Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trap line should happen before the write on the line above.
The file name should also be quoted inside the trap command.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i kinda guessed if doing everything in one line is causing the issue. I will wait for @jubrad to reply, incase he knows exact steps to repro this. Otherwise i would just get rid of this file change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that Kay can't reproduce it on this branch. I suspect that your initial thoughts were correct, and the process substitution isn't working the same on all machines.

Lets just fix the order of operations and quoting here, and then we can merge this.

@kay-kim
Copy link

kay-kim commented Dec 5, 2025

I was able to reproduce again today off of the main branch (not with this branch though). As an FYI, I am running the examples/simple.

  1. Clone the Materialize Terraform repository:

    git clone https://github.com/MaterializeInc/materialize-terraform-self-managed.git
    cd materialize-terraform-self-managed/aws/examples/simple
  2. Create a terraform.tfvars file with the following variables:

    • name_prefix: Prefix for all resource names (e.g., simple-demo)
    • aws_region: AWS region for deployment (e.g., us-east-1)
    • aws_profile: AWS CLI profile to use
    • license_key: Materialize license key
    • tags: Map of tags to apply to resources
    name_prefix = "simple-demo"
    aws_region  = "us-east-1"
    aws_profile = "your-aws-profile"
    license_key = "your-materialize-license-key"
    tags = {
      environment = "demo"
    }
  3. Initialize the Terraform directory to download the required providers and
    modules:

    terraform init
  4. Apply the Terraform configuration to create the infrastructure:

    terraform apply

    Terraform will prompt you to confirm the deployment. Type yes to proceed.

  5. I connected and did some stuff.

  6. Then to clean up, I did

    terraform destroy

    which returned

module.database.aws_security_group.database: Destroying... [id=sg-03547e532217e4cb6]
module.database.aws_security_group.database: Destruction complete after 0s
╷
│ Warning: Helm uninstall returned an information message
│ 
│ These resources were kept due to the resource policy:
│ [CustomResourceDefinition] certificaterequests.cert-manager.io
│ [CustomResourceDefinition] certificates.cert-manager.io
│ [CustomResourceDefinition] challenges.acme.cert-manager.io
│ [CustomResourceDefinition] clusterissuers.cert-manager.io
│ [CustomResourceDefinition] issuers.cert-manager.io
│ [CustomResourceDefinition] orders.acme.cert-manager.io
│ 
╵
╷
│ Error: local-exec provisioner error
│ 
│   with module.nodepool_materialize.terraform_data.destroyer,
│   on ../../modules/karpenter-nodepool/main.tf line 13, in resource "terraform_data" "destroyer":
│   13:   provisioner "local-exec" {
│ 
│ Error running command 'set -euo pipefail
│ nodeclaims=$(kubectl --kubeconfig <(echo "${KUBECONFIG_DATA}") get nodeclaims -l "karpenter.sh/nodepool=${NODEPOOL_NAME}" -o name)
│ if [ -n "${nodeclaims}" ]; then
│   echo "${nodeclaims}" | xargs kubectl --kubeconfig <(echo "${KUBECONFIG_DATA}") delete --wait=true
│ fi
': exit status 1. Output: E1205 09:37:24.943357   14318 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
\"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:24.945197   14318 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
\"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:24.946903   14318 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
\"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:24.948603   14318 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
\"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:24.950221   14318 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
\"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ The connection to the server localhost:8080 was refused - did you specify the right host or port?
│ 
╵
╷
│ Error: local-exec provisioner error
│ 
│   with module.nodepool_generic.terraform_data.destroyer,
│   on ../../modules/karpenter-nodepool/main.tf line 13, in resource "terraform_data" "destroyer":
│   13:   provisioner "local-exec" {
│ 
│ Error running command 'set -euo pipefail
│ nodeclaims=$(kubectl --kubeconfig <(echo "${KUBECONFIG_DATA}") get nodeclaims -l "karpenter.sh/nodepool=${NODEPOOL_NAME}" -o name)
│ if [ -n "${nodeclaims}" ]; then
│   echo "${nodeclaims}" | xargs kubectl --kubeconfig <(echo "${KUBECONFIG_DATA}") delete --wait=true
│ fi
': exit status 1. Output: E1205 09:37:38.421980   14369 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
\"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:38.423914   14369 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
\"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:38.425586   14369 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
\"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:38.427157   14369 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
\"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:38.428754   14369 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
\"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ The connection to the server localhost:8080 was refused - did you specify the right host or port?
│ 
╵

@alex-hunt-materialize
Copy link
Contributor

This doesn't look like an issue with the command, but likely an issue with setting the KUBECONFIG_DATA variable. We shouldn't be pointing at localhost:8080, but at the address of the K8S API server in AWS.

@jubrad
Copy link
Collaborator

jubrad commented Dec 9, 2025

This doesn't look like an issue with the command, but likely an issue with setting the KUBECONFIG_DATA variable. We shouldn't be pointing at localhost:8080, but at the address of the K8S API server in AWS.

Yeah, I'm not sure what would be pointing at localhost unless it's local setup?

@jshiwamV
Copy link
Collaborator Author

jshiwamV commented Dec 9, 2025

terraform will always point to the KUBECONFIG_DATA that we pass from aws/examples/simple/main.tf. Unless that is overridden the API server endpoint shouldn't be pointing to localhost. Can you confirm what is inside the KUBECONFIG_DATA var once. I tried to recreate this issue multiple times, I was unsuccessful on reproducing it.

@jshiwamV
Copy link
Collaborator Author

jshiwamV commented Dec 9, 2025

And is this issue deterministic? Does it happen on every destroy?

@alex-hunt-materialize
Copy link
Contributor

I cannot reproduce the issue either. My KUBECONFIG_DATA looks like (after formatting):

{
  "apiVersion": "v1",
  "clusters": [
    {
      "cluster": {
        "certificate-authority-data": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURCVENDQWUyZ0F3SUJBZ0lJRXovb2xnZEVCVVF3RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TlRFeU1Ea3hNREl6TVRCYUZ3MHpOVEV5TURjeE1ESTRNVEJhTUJVeApFekFSQmdOVkJBTVRDbXQxWW1WeWJtVjBaWE13Z2dFaU1BMEdDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLCkFvSUJBUURDL0ZBWlRXV1lUcGRUM2czY081dHAwQWlVWWdTSGlyKzVaWC8vQW55RXpZQWF4N2I0cWtSUzNNenkKRDZBY2VxOWl2TWFPcU5oOXJYSEE5T25Vd0pOSnFrdldWTFduMlpVZ0haZXJNT2pZczI0ejgyNzdRcWFjckpxTQpuTlVNYlZIam1ZbmFoWE5OdkN4MDRldjg0SGFubTgvUldkSkxlQ210SjVWS1prWEp5MWY0VVpSWW1DNVZGL2Z5CmphMHMyR0FBaUE3a2tQVks1TlZBVmZjRVhTSzQxN3dKUFNUUS9xdUVQaXBrTGtISm5aQ2h3ZElxOHM4NlZQMjAKUFdlZGs4YkxlVFZJOVV4Ylo3ZUsvT2NEZFQ2Q2lzNHpvQTA2VnlPQXp6d2ZhZERoelYyMHNDd2tjVldjakd1VgphMk5FZk1Sb3ZFRGNGY1o5cThtSmlkbXF0ZkFUQWdNQkFBR2pXVEJYTUE0R0ExVWREd0VCL3dRRUF3SUNwREFQCkJnTlZIUk1CQWY4RUJUQURBUUgvTUIwR0ExVWREZ1FXQkJTa0pkK3B0amxCNDMxMEN0bzM5RFhoWGFIa0hUQVYKQmdOVkhSRUVEakFNZ2dwcmRXSmxjbTVsZEdWek1BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQVM3b1lCQitDbApCMHNZRUdwNjF4QjNrUjRKTDN5SzJ1Wi9nMXVsQWE1VnlzcGQ3MTBFSlhLOXNTekVFQUp4cjU2amROZ055VW9NCkJyYlhEYnN1NkdKUU4yTW42UHY4MWZHbENyem5hUjlJNFpObUpsdXRqVDZGaFFnNHduY1hTcmhETG9mdjlJdm8Kdi82NVp0UnZibXF4Ui9acWhxVDI2amY3bG5ibWhOM0lOQVVOUEUxYVhPeWp5Zm1qSkVadlFxVVlaL2lYUDAzcwpBSkM3ai9SMHU1QzdlVWhybnFzakpDSUZCN1pFaTZMMENuY1F1RFppMVRZdERlaURwK1BQK2Ywd0ZHRnNwWHdFCnp6RlpJVVhQYW1QUWJLcXdZdTRobWJzd01nU0gzbGRJZ2FNTW85Sm5TSUxtS1NXaFBibGJNbmRuTmNNSnRVeHkKUzJHWXpENlAxWUNkCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K",
        "server": "https://ED76BE4472A3EADBEC948C8F57060394.gr7.us-east-1.eks.amazonaws.com"
      },
      "name": "alexhunt-eks"
    }
  ],
  "contexts": [
    {
      "context": {
        "cluster": "alexhunt-eks",
        "user": "alexhunt-eks"
      },
      "name": "alexhunt-eks"
    }
  ],
  "current-context": "alexhunt-eks",
  "kind": "Config",
  "users": [
    {
      "name": "alexhunt-eks",
      "user": {
        "exec": {
          "apiVersion": "client.authentication.k8s.io/v1beta1",
          "args": [
            "eks",
            "get-token",
            "--cluster-name",
            "alexhunt-eks",
            "--region",
            "us-east-1",
            "--profile",
            "mz-scratch-admin"
          ],
          "command": "aws"
        }
      }
    }
  ]
}

@kay-kim Could you reproduce this and share the contents of your KUBECONFIG_DATA with us? You can save it by adding a line echo "$${KUBECONFIG_DATA}" > /tmp/kubeconfig in aws/modules/karpenter-nodepool/main.tf, just above the nodeclaims= line. When you terraform destroy, it will then be saved in /tmp/kubeconfig.

diff --git a/aws/modules/karpenter-nodepool/main.tf b/aws/modules/karpenter-nodepool/main.tf
index be1c8c6..8b727bf 100644
--- a/aws/modules/karpenter-nodepool/main.tf
+++ b/aws/modules/karpenter-nodepool/main.tf
@@ -15,6 +15,7 @@ resource "terraform_data" "destroyer" {
 
     command     = <<-EOF
       set -euo pipefail
+      echo "$${KUBECONFIG_DATA}" > /tmp/kubeconfig
       nodeclaims=$(kubectl --kubeconfig <(echo "$${KUBECONFIG_DATA}") get nodeclaims -l "karpenter.sh/nodepool=$${NODEPOOL_NAME}" -o name)
       if [ -n "$${nodeclaims}" ]; then
         echo "$${nodeclaims}" | xargs kubectl --kubeconfig <(echo "$${KUBECONFIG_DATA}") delete --wait=true

@kay-kim
Copy link

kay-kim commented Dec 9, 2025

Will do.

  • On a side note, have successfully deployed to GCP and Azure, upgraded (v26.1.0 -> v26.1.1), and destroyed all successfully.

So, will try the deploy -> upgrade -> destroy on AWS this morning. (Before, I didn't do the upgrade, just deploy -> destroy .. but want to double check the upgrade steps while I have the deployment up)

@kay-kim
Copy link

kay-kim commented Dec 9, 2025

Off of the latest main branch (not this branch)

Screenshot 2025-12-09 at 9 20 11 AM

(will drop the /tmp/kubeconfig in slack as I'm not sure if the content is cleared to share in public repo )

As an aside ... last Friday:

  • I deployed the fix-local-exec branch and didn't get the error.
  • I then rebased fix-local-exec branch with main and didn't get this error but got some other error

@kay-kim
Copy link

kay-kim commented Dec 9, 2025

@jshiwamV --- we think we tracked it down to the fact that I'm on a mac and so, using zsh ..
I'm going to retry again after locally rebasing this branch with main

@kay-kim
Copy link

kay-kim commented Dec 9, 2025

As an FYI -- this branch does indeed fix the local-exec; However, locally rebasing this branch with main, I hit an error with 38 objects remaining
I've run terraform destroy four times now:

│ Error: deleting Security Group (sg-037dbf999aed801cc): operation error EC2: DeleteSecurityGroup, https response error StatusCode: 400, RequestID: d7ea4727-cd42-4021-a87f-3c9e28fc8c6b, api error DependencyViolation: resource sg-037dbf999aed801cc has a dependent object
│ 
│ 

The offending item seems to be an ENI
Running:

aws ec2 describe-network-interfaces \                                                                      
  --region us-east-1 \
  --filters Name=group-id,Values=sg-037dbf999aed801cc \
  --query "NetworkInterfaces[].{ENI:NetworkInterfaceId,Desc:Description,Status:Status,Attachment:Attachment.InstanceId,Owner:OwnerId,Requester:RequesterId,Subnet:SubnetId,PrivateIp:PrivateIpAddress}"

returns

{
    "ENI": "eni-0462b7d630a2c1848",
    "Desc": "aws-K8S-i-00432e250dd2f1426",
    ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants