fix local exec error with docs #106

jshiwamV · 2025-11-23T17:52:37Z

terraform destroy fails with local-exec error #102: was not able to reproduce the issue mentioned in here, @jubrad . Although I have broken down one liner command, into a set of commands to prepare kube_config file properly before getting node-claims.
Will raise more PRs to properly document Pre-requisites.

alex-hunt-materialize

I don't see much point in writing the file. How does this fix things?

We should document that they need kubectl, though.

alex-hunt-materialize · 2025-11-24T16:25:21Z

aws/modules/karpenter-nodepool/main.tf

+
+      kubeconfig_file=$(mktemp)
+      echo "$${KUBECONFIG_DATA}" > "$${kubeconfig_file}"
+      trap "rm -f $${kubeconfig_file}" EXIT


This trap line should happen before the write on the line above.
The file name should also be quoted inside the trap command.

yeah, i kinda guessed if doing everything in one line is causing the issue. I will wait for @jubrad to reply, incase he knows exact steps to repro this. Otherwise i would just get rid of this file change.

It seems that Kay can't reproduce it on this branch. I suspect that your initial thoughts were correct, and the process substitution isn't working the same on all machines.

Lets just fix the order of operations and quoting here, and then we can merge this.

aws/README.md

kay-kim · 2025-12-05T14:44:18Z

I was able to reproduce again today off of the main branch (not with this branch though). As an FYI, I am running the examples/simple.

Clone the Materialize Terraform repository:

git clone https://github.com/MaterializeInc/materialize-terraform-self-managed.git
cd materialize-terraform-self-managed/aws/examples/simple

Create a terraform.tfvars file with the following variables:
- name_prefix: Prefix for all resource names (e.g., simple-demo)
- aws_region: AWS region for deployment (e.g., us-east-1)
- aws_profile: AWS CLI profile to use
- license_key: Materialize license key
- tags: Map of tags to apply to resources
```
name_prefix = "simple-demo"
aws_region  = "us-east-1"
aws_profile = "your-aws-profile"
license_key = "your-materialize-license-key"
tags = {
  environment = "demo"
}
```
Initialize the Terraform directory to download the required providers and
modules:
```
terraform init
```
Apply the Terraform configuration to create the infrastructure:
```
terraform apply
```
Terraform will prompt you to confirm the deployment. Type yes to proceed.
I connected and did some stuff.
Then to clean up, I did
```
terraform destroy
```
which returned

module.database.aws_security_group.database: Destroying... [id=sg-03547e532217e4cb6]
module.database.aws_security_group.database: Destruction complete after 0s
╷
│ Warning: Helm uninstall returned an information message
│ 
│ These resources were kept due to the resource policy:
│ [CustomResourceDefinition] certificaterequests.cert-manager.io
│ [CustomResourceDefinition] certificates.cert-manager.io
│ [CustomResourceDefinition] challenges.acme.cert-manager.io
│ [CustomResourceDefinition] clusterissuers.cert-manager.io
│ [CustomResourceDefinition] issuers.cert-manager.io
│ [CustomResourceDefinition] orders.acme.cert-manager.io
│ 
╵
╷
│ Error: local-exec provisioner error
│ 
│   with module.nodepool_materialize.terraform_data.destroyer,
│   on ../../modules/karpenter-nodepool/main.tf line 13, in resource "terraform_data" "destroyer":
│   13:   provisioner "local-exec" {
│ 
│ Error running command 'set -euo pipefail
│ nodeclaims=$(kubectl --kubeconfig <(echo "${KUBECONFIG_DATA}") get nodeclaims -l "karpenter.sh/nodepool=${NODEPOOL_NAME}" -o name)
│ if [ -n "${nodeclaims}" ]; then
│   echo "${nodeclaims}" | xargs kubectl --kubeconfig <(echo "${KUBECONFIG_DATA}") delete --wait=true
│ fi
│ ': exit status 1. Output: E1205 09:37:24.943357   14318 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
│ \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:24.945197   14318 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
│ \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:24.946903   14318 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
│ \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:24.948603   14318 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
│ \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:24.950221   14318 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
│ \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ The connection to the server localhost:8080 was refused - did you specify the right host or port?
│ 
╵
╷
│ Error: local-exec provisioner error
│ 
│   with module.nodepool_generic.terraform_data.destroyer,
│   on ../../modules/karpenter-nodepool/main.tf line 13, in resource "terraform_data" "destroyer":
│   13:   provisioner "local-exec" {
│ 
│ Error running command 'set -euo pipefail
│ nodeclaims=$(kubectl --kubeconfig <(echo "${KUBECONFIG_DATA}") get nodeclaims -l "karpenter.sh/nodepool=${NODEPOOL_NAME}" -o name)
│ if [ -n "${nodeclaims}" ]; then
│   echo "${nodeclaims}" | xargs kubectl --kubeconfig <(echo "${KUBECONFIG_DATA}") delete --wait=true
│ fi
│ ': exit status 1. Output: E1205 09:37:38.421980   14369 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
│ \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:38.423914   14369 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
│ \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:38.425586   14369 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
│ \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:38.427157   14369 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
│ \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ E1205 09:37:38.428754   14369 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get
│ \"http://localhost:8080/api?timeout=32s\": dial tcp [::1]:8080: connect: connection refused"
│ The connection to the server localhost:8080 was refused - did you specify the right host or port?
│ 
╵

alex-hunt-materialize · 2025-12-08T12:16:40Z

This doesn't look like an issue with the command, but likely an issue with setting the KUBECONFIG_DATA variable. We shouldn't be pointing at localhost:8080, but at the address of the K8S API server in AWS.

jubrad · 2025-12-09T03:22:21Z

This doesn't look like an issue with the command, but likely an issue with setting the KUBECONFIG_DATA variable. We shouldn't be pointing at localhost:8080, but at the address of the K8S API server in AWS.

Yeah, I'm not sure what would be pointing at localhost unless it's local setup?

jshiwamV · 2025-12-09T09:15:20Z

terraform will always point to the KUBECONFIG_DATA that we pass from aws/examples/simple/main.tf. Unless that is overridden the API server endpoint shouldn't be pointing to localhost. Can you confirm what is inside the KUBECONFIG_DATA var once. I tried to recreate this issue multiple times, I was unsuccessful on reproducing it.

jshiwamV · 2025-12-09T09:15:51Z

And is this issue deterministic? Does it happen on every destroy?

alex-hunt-materialize · 2025-12-09T10:57:30Z

I cannot reproduce the issue either. My KUBECONFIG_DATA looks like (after formatting):

{
  "apiVersion": "v1",
  "clusters": [
    {
      "cluster": {
        "certificate-authority-data": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURCVENDQWUyZ0F3SUJBZ0lJRXovb2xnZEVCVVF3RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TlRFeU1Ea3hNREl6TVRCYUZ3MHpOVEV5TURjeE1ESTRNVEJhTUJVeApFekFSQmdOVkJBTVRDbXQxWW1WeWJtVjBaWE13Z2dFaU1BMEdDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLCkFvSUJBUURDL0ZBWlRXV1lUcGRUM2czY081dHAwQWlVWWdTSGlyKzVaWC8vQW55RXpZQWF4N2I0cWtSUzNNenkKRDZBY2VxOWl2TWFPcU5oOXJYSEE5T25Vd0pOSnFrdldWTFduMlpVZ0haZXJNT2pZczI0ejgyNzdRcWFjckpxTQpuTlVNYlZIam1ZbmFoWE5OdkN4MDRldjg0SGFubTgvUldkSkxlQ210SjVWS1prWEp5MWY0VVpSWW1DNVZGL2Z5CmphMHMyR0FBaUE3a2tQVks1TlZBVmZjRVhTSzQxN3dKUFNUUS9xdUVQaXBrTGtISm5aQ2h3ZElxOHM4NlZQMjAKUFdlZGs4YkxlVFZJOVV4Ylo3ZUsvT2NEZFQ2Q2lzNHpvQTA2VnlPQXp6d2ZhZERoelYyMHNDd2tjVldjakd1VgphMk5FZk1Sb3ZFRGNGY1o5cThtSmlkbXF0ZkFUQWdNQkFBR2pXVEJYTUE0R0ExVWREd0VCL3dRRUF3SUNwREFQCkJnTlZIUk1CQWY4RUJUQURBUUgvTUIwR0ExVWREZ1FXQkJTa0pkK3B0amxCNDMxMEN0bzM5RFhoWGFIa0hUQVYKQmdOVkhSRUVEakFNZ2dwcmRXSmxjbTVsZEdWek1BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQVM3b1lCQitDbApCMHNZRUdwNjF4QjNrUjRKTDN5SzJ1Wi9nMXVsQWE1VnlzcGQ3MTBFSlhLOXNTekVFQUp4cjU2amROZ055VW9NCkJyYlhEYnN1NkdKUU4yTW42UHY4MWZHbENyem5hUjlJNFpObUpsdXRqVDZGaFFnNHduY1hTcmhETG9mdjlJdm8Kdi82NVp0UnZibXF4Ui9acWhxVDI2amY3bG5ibWhOM0lOQVVOUEUxYVhPeWp5Zm1qSkVadlFxVVlaL2lYUDAzcwpBSkM3ai9SMHU1QzdlVWhybnFzakpDSUZCN1pFaTZMMENuY1F1RFppMVRZdERlaURwK1BQK2Ywd0ZHRnNwWHdFCnp6RlpJVVhQYW1QUWJLcXdZdTRobWJzd01nU0gzbGRJZ2FNTW85Sm5TSUxtS1NXaFBibGJNbmRuTmNNSnRVeHkKUzJHWXpENlAxWUNkCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K",
        "server": "https://ED76BE4472A3EADBEC948C8F57060394.gr7.us-east-1.eks.amazonaws.com"
      },
      "name": "alexhunt-eks"
    }
  ],
  "contexts": [
    {
      "context": {
        "cluster": "alexhunt-eks",
        "user": "alexhunt-eks"
      },
      "name": "alexhunt-eks"
    }
  ],
  "current-context": "alexhunt-eks",
  "kind": "Config",
  "users": [
    {
      "name": "alexhunt-eks",
      "user": {
        "exec": {
          "apiVersion": "client.authentication.k8s.io/v1beta1",
          "args": [
            "eks",
            "get-token",
            "--cluster-name",
            "alexhunt-eks",
            "--region",
            "us-east-1",
            "--profile",
            "mz-scratch-admin"
          ],
          "command": "aws"
        }
      }
    }
  ]
}

@kay-kim Could you reproduce this and share the contents of your KUBECONFIG_DATA with us? You can save it by adding a line echo "$${KUBECONFIG_DATA}" > /tmp/kubeconfig in aws/modules/karpenter-nodepool/main.tf, just above the nodeclaims= line. When you terraform destroy, it will then be saved in /tmp/kubeconfig.

diff --git a/aws/modules/karpenter-nodepool/main.tf b/aws/modules/karpenter-nodepool/main.tf
index be1c8c6..8b727bf 100644
--- a/aws/modules/karpenter-nodepool/main.tf
+++ b/aws/modules/karpenter-nodepool/main.tf
@@ -15,6 +15,7 @@ resource "terraform_data" "destroyer" {
 
     command     = <<-EOF
       set -euo pipefail
+      echo "$${KUBECONFIG_DATA}" > /tmp/kubeconfig
       nodeclaims=$(kubectl --kubeconfig <(echo "$${KUBECONFIG_DATA}") get nodeclaims -l "karpenter.sh/nodepool=$${NODEPOOL_NAME}" -o name)
       if [ -n "$${nodeclaims}" ]; then
         echo "$${nodeclaims}" | xargs kubectl --kubeconfig <(echo "$${KUBECONFIG_DATA}") delete --wait=true

kay-kim · 2025-12-09T13:23:21Z

Will do.

On a side note, have successfully deployed to GCP and Azure, upgraded (v26.1.0 -> v26.1.1), and destroyed all successfully.

So, will try the deploy -> upgrade -> destroy on AWS this morning. (Before, I didn't do the upgrade, just deploy -> destroy .. but want to double check the upgrade steps while I have the deployment up)

kay-kim · 2025-12-09T14:26:32Z

Off of the latest main branch (not this branch)

(will drop the /tmp/kubeconfig in slack as I'm not sure if the content is cleared to share in public repo )

As an aside ... last Friday:

I deployed the fix-local-exec branch and didn't get the error.
I then rebased fix-local-exec branch with main and didn't get this error but got some other error

kay-kim · 2025-12-09T14:46:05Z

@jshiwamV --- we think we tracked it down to the fact that I'm on a mac and so, using zsh ..
I'm going to retry again after locally rebasing this branch with main

kay-kim · 2025-12-09T19:59:41Z

As an FYI -- this branch does indeed fix the local-exec; However, locally rebasing this branch with main, I hit an error with 38 objects remaining
I've run terraform destroy four times now:

│ Error: deleting Security Group (sg-037dbf999aed801cc): operation error EC2: DeleteSecurityGroup, https response error StatusCode: 400, RequestID: d7ea4727-cd42-4021-a87f-3c9e28fc8c6b, api error DependencyViolation: resource sg-037dbf999aed801cc has a dependent object
│ 
│

The offending item seems to be an ENI
Running:

aws ec2 describe-network-interfaces \                                                                      
  --region us-east-1 \
  --filters Name=group-id,Values=sg-037dbf999aed801cc \
  --query "NetworkInterfaces[].{ENI:NetworkInterfaceId,Desc:Description,Status:Status,Attachment:Attachment.InstanceId,Owner:OwnerId,Requester:RequesterId,Subnet:SubnetId,PrivateIp:PrivateIpAddress}"

returns

{
    "ENI": "eni-0462b7d630a2c1848",
    "Desc": "aws-K8S-i-00432e250dd2f1426",
    ...

fix local exec error with docs

4f61160

jshiwamV requested a review from jubrad November 23, 2025 17:53

get rid of redundant checks

94aec49

jshiwamV requested a review from alex-hunt-materialize November 24, 2025 16:10

jshiwamV marked this pull request as ready for review November 24, 2025 16:12

alex-hunt-materialize reviewed Nov 24, 2025

View reviewed changes

aws/README.md Show resolved Hide resolved

fix local exec error with docs #106

Are you sure you want to change the base?

fix local exec error with docs #106

Uh oh!

Conversation

jshiwamV commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alex-hunt-materialize left a comment

Choose a reason for hiding this comment

Uh oh!

alex-hunt-materialize Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jshiwamV Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

alex-hunt-materialize Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kay-kim commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alex-hunt-materialize commented Dec 8, 2025

Uh oh!

jubrad commented Dec 9, 2025

Uh oh!

jshiwamV commented Dec 9, 2025

Uh oh!

jshiwamV commented Dec 9, 2025

Uh oh!

alex-hunt-materialize commented Dec 9, 2025

Uh oh!

kay-kim commented Dec 9, 2025

Uh oh!

kay-kim commented Dec 9, 2025

Uh oh!

kay-kim commented Dec 9, 2025

Uh oh!

kay-kim commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jshiwamV commented Nov 23, 2025 •

edited

Loading

alex-hunt-materialize Nov 24, 2025 •

edited

Loading

kay-kim commented Dec 5, 2025 •

edited

Loading

kay-kim commented Dec 9, 2025 •

edited

Loading