Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Deploying locally on Mac OS, Unable to pull conda-store-server Image #2918

Closed
shikanchen opened this issue Jan 21, 2025 · 5 comments
Closed
Labels
needs: triage 🚦 Someone needs to have a look at this issue and triage type: bug 🐛 Something isn't working

Comments

@shikanchen
Copy link

Describe the bug

  • Deployments for conda-store-server and conda-store-worker fail to roll out during a local deployment on macOS with Nebari.
  • The deployment hangs indefinitely with OpenTofu reporting: Still creating....
  • Eventually, OpenTofu throws the error:
Error: Waiting for rollout to finish: 1 replicas wanted; 0 replicas Ready

Expected behavior

  • Deployments for conda-store-server and conda-store-worker should complete successfully.
  • The conda-store-server and conda-store-worker pods should run and become Ready.
  • Docker should pull the appropriate linux/arm64 image without errors.

OS and architecture in which you are running Nebari

OS: macOS Ventura 13.5.2, and Architecture: ARM64 (Apple Silicon)

How to Reproduce the problem?

  • Install Nebari using conda install nebari -c conda-forge. I attached the details of the environment I use for deploying Nebari in Anything else section.
  • Create a configuration file (nebari-config.yaml) for a local deployment on macOS.
  • Run the deployment using:

Command output

[tofu]: module.jupyterhub.local_file.overrides_json: Creating...
[tofu]: module.jupyterhub.local_file.overrides_json: Creation complete after 0s [id=2c96c288175afecf06207ae7a1228941ae6fed75]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Destroying... [id=dev/nebari-conda-store-worker]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Destroying... [id=dev/nebari-conda-store-server]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Destruction complete after 0s
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Destruction complete after 0s
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Creating...
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Creating...
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [1m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [1m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [1m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [1m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [1m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [1m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [1m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [1m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [1m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [1m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [1m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [1m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [2m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [2m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [2m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [2m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [2m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [2m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [2m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [2m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [2m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [2m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [2m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [2m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [3m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [3m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [3m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [3m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [3m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [3m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [3m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [3m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [3m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [3m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [3m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [3m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [4m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [4m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [4m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [4m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [4m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [4m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [4m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [4m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [4m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [4m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [4m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [4m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [5m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [5m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [5m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [5m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [5m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [5m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [5m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [5m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [5m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [5m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [5m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [5m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [6m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [6m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [6m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [6m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [6m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [6m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [6m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [6m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [6m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [6m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [6m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [6m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [7m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [7m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [7m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [7m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [7m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [7m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [7m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [7m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [7m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [7m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [7m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [7m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [8m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [8m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [8m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [8m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [8m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [8m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [8m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [8m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [8m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [8m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [8m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [8m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [9m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [9m0s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [9m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [9m10s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [9m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [9m20s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [9m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [9m30s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [9m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [9m40s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [9m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.server: Still creating... [9m50s elapsed]
[tofu]: module.kubernetes-conda-store-server.kubernetes_deployment.worker: Still creating... [10m0s elapsed]
[tofu]: ╷
[tofu]: │ Warning: "default_secret_name" is no longer applicable for Kubernetes v1.24.0 and above
[tofu]: │
[tofu]: │   with module.argo-workflows[0].kubernetes_service_account_v1.argo-admin-sa,
[tofu]: │   on modules/kubernetes/services/argo-workflows/main.tf line 188, in resource "kubernetes_service_account_v1" "argo-admin-sa":
[tofu]: │  188: resource "kubernetes_service_account_v1" "argo-admin-sa" {
[tofu]: │
[tofu]: │ Starting from version 1.24.0 Kubernetes does not automatically generate a
[tofu]: │ token for service accounts, in this case, "default_secret_name" will be
[tofu]: │ empty
[tofu]: │
[tofu]: │ (and 5 more similar warnings elsewhere)
[tofu]: ╵
[tofu]: ╷
[tofu]: │ Error: Waiting for rollout to finish: 1 replicas wanted; 0 replicas Ready
[tofu]: │
[tofu]: │   with module.kubernetes-conda-store-server.kubernetes_deployment.server,
[tofu]: │   on modules/kubernetes/services/conda-store/server.tf line 102, in resource "kubernetes_deployment" "server":
[tofu]: │  102: resource "kubernetes_deployment" "server" {
[tofu]: │
[tofu]: ╵
[tofu]: ╷
[tofu]: │ Error: Waiting for rollout to finish: 1 replicas wanted; 0 replicas Ready
[tofu]: │
[tofu]: │   with module.kubernetes-conda-store-server.kubernetes_deployment.worker,
[tofu]: │   on modules/kubernetes/services/conda-store/worker.tf line 59, in resource "kubernetes_deployment" "worker":
[tofu]: │   59: resource "kubernetes_deployment" "worker" {
[tofu]: │
[tofu]: ╵
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/shikanchen/anaconda3/envs/nebari/lib/python3.12/site-packages/_nebari/subcommands/deploy. │
│ py:92 in deploy                                                                                  │
│                                                                                                  │
│   89 │   │   │   msg = "Digital Ocean support is currently being deprecated and will be remov    │
│   90 │   │   │   typer.confirm(msg)                                                              │
│   91 │   │                                                                                       │
│ ❱ 92 │   │   deploy_configuration(                                                               │
│   93 │   │   │   config,                                                                         │
│   94 │   │   │   stages,                                                                         │
│   95 │   │   │   disable_prompt=disable_prompt,                                                  │
│                                                                                                  │
│ /Users/shikanchen/anaconda3/envs/nebari/lib/python3.12/site-packages/_nebari/deploy.py:55 in     │
│ deploy_configuration                                                                             │
│                                                                                                  │
│   52 │   │   │   │   s: hookspecs.NebariStage = stage(                                           │
│   53 │   │   │   │   │   output_directory=pathlib.Path.cwd(), config=config                      │
│   54 │   │   │   │   )                                                                           │
│ ❱ 55 │   │   │   │   stack.enter_context(s.deploy(stage_outputs, disable_prompt))                │
│   56 │   │   │   │                                                                               │
│   57 │   │   │   │   if not disable_checks:                                                      │
│   58 │   │   │   │   │   s.check(stage_outputs, disable_prompt)                                  │
│                                                                                                  │
│ /Users/shikanchen/anaconda3/envs/nebari/lib/python3.12/contextlib.py:526 in enter_context        │
│                                                                                                  │
│   523 │   │   except AttributeError:                                                             │
│   524 │   │   │   raise TypeError(f"'{cls.__module__}.{cls.__qualname__}' object does "
│   525 │   │   │   │   │   │   │   f"not support the context manager protocol") from None         │
│ ❱ 526 │   │   result = _enter(cm)                                                                │
│   527 │   │   self._push_cm_exit(cm, _exit)                                                      │
│   528 │   │   return result                                                                      │
│   529                                                                                            │
│                                                                                                  │
│ /Users/shikanchen/anaconda3/envs/nebari/lib/python3.12/contextlib.py:137 in __enter__            │
│                                                                                                  │
│   134 │   │   # they are only needed for recreation, which is not possible anymore               │
│   135 │   │   del self.args, self.kwds, self.func                                                │
│   136 │   │   try:                                                                               │
│ ❱ 137 │   │   │   return next(self.gen)                                                          │
│   138 │   │   except StopIteration:                                                              │
│   139 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   140                                                                                            │
│                                                                                                  │
│ /Users/shikanchen/anaconda3/envs/nebari/lib/python3.12/site-packages/_nebari/stages/base.py:298  │
│ in deploy                                                                                        │
│                                                                                                  │
│   295 │   │   │   deploy_config["tofu_import"] = True                                            │
│   296 │   │   │   deploy_config["state_imports"] = state_imports                                 │
│   297 │   │                                                                                      │
│ ❱ 298 │   │   self.set_outputs(stage_outputs, opentofu.deploy(**deploy_config))                  │
│   299 │   │   self.post_deploy(stage_outputs, disable_prompt)                                    │
│   300 │   │   yield                                                                              │
│   301                                                                                            │
│                                                                                                  │
│ /Users/shikanchen/anaconda3/envs/nebari/lib/python3.12/site-packages/_nebari/provider/opentofu.p │
│ y:71 in deploy                                                                                   │
│                                                                                                  │
│    68 │   │   │   │   )                                                                          │
│    69 │   │                                                                                      │
│    70 │   │   if tofu_apply:                                                                     │
│ ❱  71 │   │   │   apply(directory, var_files=[f.name])                                           │
│    72 │   │                                                                                      │
│    73 │   │   if tofu_destroy:                                                                   │
│    74 │   │   │   destroy(directory, var_files=[f.name])                                         │
│                                                                                                  │
│ /Users/shikanchen/anaconda3/envs/nebari/lib/python3.12/site-packages/_nebari/provider/opentofu.p │
│ y:152 in apply                                                                                   │
│                                                                                                  │
│   149 │   │   + ["-var-file=" + _ for _ in var_files]                                            │
│   150 │   )                                                                                      │
│   151 │   with timer(logger, "tofu apply"):                                                      │
│ ❱ 152 │   │   run_tofu_subprocess(command, cwd=directory, prefix="tofu")                         │
│   153                                                                                            │
│   154                                                                                            │
│   155 def output(directory=None):                                                                │
│                                                                                                  │
│ /Users/shikanchen/anaconda3/envs/nebari/lib/python3.12/site-packages/_nebari/provider/opentofu.p │
│ y:120 in run_tofu_subprocess                                                                     │
│                                                                                                  │
│   117 │   logger.info(f" tofu at {tofu_path}")                                                   │
│   118 │   exit_code, output = run_subprocess_cmd([tofu_path] + processargs, **kwargs)            │
│   119 │   if exit_code != 0:                                                                     │
│ ❱ 120 │   │   raise OpenTofuException("OpenTofu returned an error")                              │
│   121 │   return output                                                                          │
│   122                                                                                            │
│   123                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OpenTofuException: OpenTofu returned an error

Versions and dependencies used.

  • Docker version: 27.4.0, build bde2b89
  • Conda version: 24.5.0
  • Kubectl version:
    • Client Version: v1.30.5
    • Server Version: v1.29.2
  • Nebari version: 2024.12.1

Compute environment

None

Integrations

conda-store

Anything else?

name: nebari
channels:
  - conda-forge
  - defaults
dependencies:
  - aiohappyeyeballs=2.4.4=pyhd8ed1ab_1
  - aiohttp=3.11.11=py312h998013c_0
  - aiosignal=1.3.2=pyhd8ed1ab_0
  - annotated-types=0.7.0=pyhd8ed1ab_1
  - attrs=24.3.0=pyh71513ae_0
  - auth0-python=4.7.1=pyhd8ed1ab_0
  - azure-common=1.1.28=pyhd8ed1ab_1
  - azure-core=1.32.0=pyhff2d567_0
  - azure-identity=1.12.0=pyhd8ed1ab_0
  - azure-mgmt-containerservice=26.0.0=pyhd8ed1ab_0
  - azure-mgmt-core=1.5.0=pyhd8ed1ab_1
  - azure-mgmt-resource=23.0.1=pyhd8ed1ab_2
  - bcrypt=4.0.1=py312h0002256_1
  - beautifulsoup4=4.12.3=pyha770c72_1
  - blinker=1.9.0=pyhff2d567_0
  - boto3=1.34.63=pyhd8ed1ab_0
  - botocore=1.34.162=pyge310_1234567_0
  - brotli-python=1.1.0=py312hde4cb15_2
  - bzip2=1.0.8=h99b78c6_7
  - c-ares=1.34.4=h5505292_0
  - ca-certificates=2024.12.14=hf0a4a13_0
  - cachetools=5.5.0=pyhd8ed1ab_1
  - cairo=1.18.2=h6a3b0d2_1
  - certifi=2024.12.14=pyhd8ed1ab_0
  - cffi=1.17.1=py312h0fad829_0
  - charset-normalizer=3.4.1=pyhd8ed1ab_0
  - click=8.1.8=pyh707e725_0
  - cloudflare=2.11.7=pyhd8ed1ab_0
  - colorama=0.4.6=pyhd8ed1ab_1
  - cryptography=42.0.8=py312had01cb0_0
  - deprecation=2.1.0=pyh9f0ad1d_0
  - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
  - font-ttf-inconsolata=3.000=h77eed37_0
  - font-ttf-source-code-pro=2.038=h77eed37_0
  - font-ttf-ubuntu=0.83=h77eed37_3
  - fontconfig=2.15.0=h1383a14_1
  - fonts-conda-ecosystem=1=0
  - fonts-conda-forge=1=0
  - freetype=2.12.1=hadb7bae_2
  - frozenlist=1.5.0=py312h0bf5046_0
  - google-api-core=2.24.0=pyhd8ed1ab_0
  - google-api-core-grpc=2.24.0=hd8ed1ab_0
  - google-auth=2.31.0=pyhff2d567_0
  - google-cloud-compute=1.19.1=pyhff2d567_0
  - google-cloud-container=2.49.0=pyhd8ed1ab_0
  - google-cloud-core=2.4.1=pyhd8ed1ab_1
  - google-cloud-iam=2.15.1=pyhd8ed1ab_0
  - google-cloud-storage=2.18.0=pyhff2d567_0
  - google-crc32c=1.1.2=py312h1fa1217_6
  - google-resumable-media=2.7.2=pyhd8ed1ab_2
  - googleapis-common-protos=1.66.0=pyhff2d567_0
  - googleapis-common-protos-grpc=1.66.0=pyhff2d567_0
  - grpc-google-iam-v1=0.13.1=pyhd8ed1ab_1
  - grpcio=1.62.2=py312h17030e7_0
  - grpcio-status=1.62.2=pyhd8ed1ab_0
  - h2=4.1.0=pyhd8ed1ab_1
  - hpack=4.0.0=pyhd8ed1ab_1
  - hyperframe=6.0.1=pyhd8ed1ab_1
  - icu=75.1=hfee45f7_0
  - idna=3.10=pyhd8ed1ab_1
  - isodate=0.7.2=pyhd8ed1ab_1
  - jinja2=3.1.5=pyhd8ed1ab_0
  - jmespath=1.0.1=pyhd8ed1ab_1
  - jsonlines=4.0.0=pyhd8ed1ab_0
  - jwcrypto=1.5.6=pyhd8ed1ab_1
  - libabseil=20240116.2=cxx17_h00cdb27_1
  - libcrc32c=1.1.2=hbdafb3b_0
  - libcxx=19.1.7=ha82da77_0
  - libexpat=2.6.4=h286801f_0
  - libffi=3.4.2=h3422bc3_5
  - libgcrypt-lib=1.11.0=h5505292_2
  - libgirepository=1.82.0=h607895c_0
  - libglib=2.82.2=hdff4504_1
  - libgpg-error=1.51=h579ddeb_1
  - libgrpc=1.62.2=h9c18a4f_0
  - libiconv=1.17=h0d3ecfb_2
  - libintl=0.22.5=h8414b35_3
  - liblzma=5.6.3=h39f12f2_1
  - libpng=1.6.45=h3783ad8_0
  - libprotobuf=4.25.3=hc39d83c_1
  - libre2-11=2023.09.01=h7b2c953_2
  - libsecret=0.21.6=h4e030ea_0
  - libsodium=1.0.20=h99b78c6_0
  - libsqlite=3.48.0=h3f77e49_0
  - libzlib=1.3.1=h8359307_2
  - markdown-it-py=3.0.0=pyhd8ed1ab_1
  - markupsafe=3.0.2=py312h998013c_1
  - mdurl=0.1.2=pyhd8ed1ab_1
  - msal=1.31.1=pyhd8ed1ab_0
  - msal_extensions=1.2.0=py312h81bd7bf_2
  - multidict=6.1.0=py312hdb8e49c_1
  - ncurses=6.5=h5e97a16_2
  - nebari=2024.12.1=pyh707e725_0
  - oauthlib=3.2.2=pyhd8ed1ab_1
  - openssl=3.4.0=h81ee809_1
  - packaging=23.2=pyhd8ed1ab_0
  - pcre2=10.44=h297a79d_2
  - pip=24.3.1=pyh8b19718_2
  - pixman=0.44.2=h2f9eb0b_0
  - pluggy=1.3.0=pyhd8ed1ab_0
  - portalocker=2.10.1=py312h81bd7bf_1
  - prompt-toolkit=3.0.36=pyha770c72_0
  - prompt_toolkit=3.0.36=hd8ed1ab_0
  - propcache=0.2.1=py312hea69d52_0
  - proto-plus=1.25.0=pyhd8ed1ab_1
  - protobuf=4.25.3=py312he4aa971_1
  - pyasn1=0.6.1=pyhd8ed1ab_2
  - pyasn1-modules=0.4.1=pyhd8ed1ab_1
  - pycairo=1.27.0=py312h798cee4_0
  - pycparser=2.22=pyh29332c3_1
  - pydantic=2.9.2=pyhd8ed1ab_0
  - pydantic-core=2.23.4=py312he431725_0
  - pygments=2.19.1=pyhd8ed1ab_0
  - pygobject=3.50.0=py312hc4f7465_1
  - pyjwt=2.10.1=pyhd8ed1ab_0
  - pynacl=1.5.0=py312h024a12e_4
  - pyopenssl=25.0.0=pyhd8ed1ab_0
  - pysocks=1.7.1=pyha55dd90_7
  - python=3.12.8=hc22306f_1_cpython
  - python-dateutil=2.9.0.post0=pyhff2d567_1
  - python-keycloak=3.12.0=pyhd8ed1ab_0
  - python-kubernetes=27.2.0=pyhd8ed1ab_0
  - python_abi=3.12=5_cp312
  - pyu2f=0.1.5=pyhd8ed1ab_1
  - pyyaml=6.0.2=py312h024a12e_1
  - questionary=2.0.0=pyhd8ed1ab_0
  - re2=2023.09.01=h4cba328_2
  - readline=8.2=h92ec313_1
  - requests=2.32.3=pyhd8ed1ab_1
  - requests-oauthlib=2.0.0=pyhd8ed1ab_1
  - requests-toolbelt=1.0.0=pyhd8ed1ab_1
  - rich=13.5.1=pyhd8ed1ab_0
  - rsa=4.9=pyhd8ed1ab_1
  - ruamel.yaml=0.18.6=py312h0bf5046_1
  - ruamel.yaml.clib=0.2.8=py312h0bf5046_1
  - s3transfer=0.10.4=pyhd8ed1ab_1
  - setuptools=75.8.0=pyhff2d567_0
  - shellingham=1.5.4=pyhd8ed1ab_1
  - six=1.17.0=pyhd8ed1ab_0
  - soupsieve=2.5=pyhd8ed1ab_1
  - tk=8.6.13=h5083fa2_1
  - typer=0.9.0=pyhd8ed1ab_0
  - typing-extensions=4.11.0=hd8ed1ab_0
  - typing_extensions=4.11.0=pyha770c72_0
  - tzdata=2025a=h78e105d_0
  - urllib3=2.3.0=pyhd8ed1ab_0
  - wcwidth=0.2.13=pyhd8ed1ab_1
  - websocket-client=1.8.0=pyhd8ed1ab_1
  - wheel=0.45.1=pyhd8ed1ab_1
  - yaml=0.2.5=h3422bc3_2
  - yarl=1.18.3=py312hea69d52_0
  - zstandard=0.23.0=py312h15fbf35_1
  - zstd=1.5.6=hb46c0d2_0
@shikanchen shikanchen added needs: triage 🚦 Someone needs to have a look at this issue and triage type: bug 🐛 Something isn't working labels Jan 21, 2025
@viniciusdc
Copy link
Contributor

viniciusdc commented Jan 21, 2025

I think @marcelovilla has encountered a similar problem recently; if it's the same, the issue was with how the most recent conda-store released docker images didn't have the proper tagging scheme for the ARM images. The solution back then was to pass the manually sha hash instead of the label.

You will need to make a quick change in the deployment config manifest, here's a command for you to try:

kubectl set image deployment/nebari-conda-store-worker conda-store-server=<new-image> --namespace=<namespace>

this will soon be addressed by the conda-store team, but in the mean time the above should be a good workarorund

@marcelovilla
Copy link
Member

@shikanchen can you try adding this block in your nebari-config.yaml and redeploying?

conda_store:
  image: quay.io/quansight/conda-store-server
  image_tag: sha-f8875ca

As @viniciusdc mentioned, the conda-store images are not being properly tagged for ARM so you have to specify a hash for the specific build.

You can find all the images at https://quay.io/repository/quansight/conda-store-server?tab=tags. sha-f8875ca corresponds to the 2024.11.2 release, which is the latest.

@shikanchen
Copy link
Author

shikanchen commented Jan 21, 2025

@shikanchen can you try adding this block in your nebari-config.yaml and redeploying?

conda_store:
image: quay.io/quansight/conda-store-server
image_tag: sha-f8875ca
As @viniciusdc mentioned, the conda-store images are not being properly tagged for ARM so you have to specify a hash for the specific build.

You can find all the images at https://quay.io/repository/quansight/conda-store-server?tab=tags. sha-f8875ca corresponds to the 2024.11.2 release, which is the latest.

@marcelovilla Thank you for the suggested fix. I applied the fix to update the image and tag for conda-store-sercer in the nebari-config.yaml file, and after redeploying, the deployment successfully passed the image-pulling step. But the conda-store health check is now failing consistently. The issue persists after I tried older released images at https://quay.io/repository/quansight/conda-store-server?tab=tags. The conda-store-server pod shows as running (READY 1/1), but it consistently fails the health check for https://172.18.1.100/conda-store/api/v1/.

Here's the output of the issue:

[tofu]: Apply complete! Resources: 1 added, 2 changed, 0 destroyed.
[tofu]:
[tofu]: Outputs:
[tofu]:
[tofu]: forward-auth-middleware = {
[tofu]:   "name" = "traefik-forward-auth"
[tofu]: }
[tofu]: forward-auth-service = {
[tofu]:   "name" = "forwardauth-service"
[tofu]: }
[tofu]: service_urls = {
[tofu]:   "argo-workflows" = {
[tofu]:     "health_url" = "https://172.18.1.100/argo/"
[tofu]:     "url" = "https://172.18.1.100/argo/"
[tofu]:   }
[tofu]:   "conda_store" = {
[tofu]:     "health_url" = "https://172.18.1.100/conda-store/api/v1/"
[tofu]:     "url" = "https://172.18.1.100/conda-store/"
[tofu]:   }
[tofu]:   "dask_gateway" = {
[tofu]:     "health_url" = "https://172.18.1.100/gateway/api/version"
[tofu]:     "url" = "https://172.18.1.100/gateway/"
[tofu]:   }
[tofu]:   "jupyterhub" = {
[tofu]:     "health_url" = "https://172.18.1.100/hub/api/"
[tofu]:     "url" = "https://172.18.1.100/"
[tofu]:   }
[tofu]:   "keycloak" = {
[tofu]:     "health_url" = "https://172.18.1.100/auth/realms/master"
[tofu]:     "url" = "https://172.18.1.100/auth/"
[tofu]:   }
[tofu]:   "monitoring" = {
[tofu]:     "health_url" = "https://172.18.1.100/monitoring/api/health"
[tofu]:     "url" = "https://172.18.1.100/monitoring/"
[tofu]:   }
[tofu]: }
Attempt 1 health check succeeded for url=https://172.18.1.100/argo/
Attempt 1 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 2 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 3 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 4 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 5 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 6 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 7 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 8 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 9 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 10 health check failed for url=https://172.18.1.100/conda-store/api/v1/
ERROR: Service conda_store DOWN when checking url=https://172.18.1.100/conda-store/api/v1/

The pods status I pulled after the issue occurred:

(nebari) ➜  ~ kubectl get pods -n dev
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-nebari-kube-prometheus-sta-alertmanager-0   2/2     Running   0          27m
argo-workflows-server-585dd7f586-brc6h                   1/1     Running   0          30m
argo-workflows-workflow-controller-586dcfd8f7-5tcc5      1/1     Running   0          30m
continuous-image-puller-vg8mx                            1/1     Running   0          25m
forwardauth-deployment-7975cf64db-9f86t                  1/1     Running   0          30m
hub-9d4c94bcd-k78zs                                      1/1     Running   0          25m
keycloak-0                                               1/1     Running   0          32m
keycloak-postgresql-0                                    1/1     Running   0          32m
loki-backend-0                                           2/2     Running   0          29m
loki-canary-5jf9v                                        1/1     Running   0          29m
loki-gateway-bf4d7b485-zfcxn                             1/1     Running   0          29m
loki-read-6fb46c7db4-4lcnc                               1/1     Running   0          29m
loki-write-0                                             1/1     Running   0          29m
nebari-conda-store-minio-7f68f7f4c8-pcvhm                1/1     Running   0          29m
nebari-conda-store-postgresql-postgresql-0               1/1     Running   0          29m
nebari-conda-store-redis-master-0                        1/1     Running   0          29m
nebari-conda-store-server-649b9d499f-rqljn               1/1     Running   0          10m
nebari-conda-store-worker-547dc4899c-kjhqt               2/2     Running   0          10m
nebari-daskgateway-controller-9746b74bb-prp9c            1/1     Running   0          26m
nebari-daskgateway-gateway-85744f876f-mjckz              1/1     Running   0          26m
nebari-grafana-5f7f4cb8f4-82f55                          3/3     Running   0          28m
nebari-jupyterhub-sftp-68d8999fd7-w7hjz                  1/1     Running   0          29m
nebari-jupyterhub-ssh-675fbfdb95-2cszh                   1/1     Running   0          29m
nebari-kube-prometheus-sta-operator-77cbbffb7d-rx2cx     1/1     Running   0          28m
nebari-kube-state-metrics-65b8c8fd48-2k688               1/1     Running   0          28m
nebari-loki-minio-7b7cbdd87b-9d7zx                       1/1     Running   0          29m
nebari-prometheus-node-exporter-dpfxz                    1/1     Running   0          28m
nebari-promtail-l8kh6                                    1/1     Running   0          27m
nebari-traefik-ingress-75f6d994dd-qzjz6                  1/1     Running   0          33m
nebari-workflow-controller-5dd467bfc-p2qzd               1/1     Running   0          30m
nfs-server-nfs-6b8c9cd476-5dz7j                          1/1     Running   0          30m
prometheus-nebari-kube-prometheus-sta-prometheus-0       2/2     Running   0          27m
proxy-7bfb8c4885-tqtwk                                   1/1     Running   0          25m
user-scheduler-6fc686fbf9-9sjhv                          1/1     Running   0          25m
user-scheduler-6fc686fbf9-l95bw                          1/1     Running   0          25m

@marcelovilla
Copy link
Member

@shikanchen sorry about that, I just realized that while we updated Nebari to be compatible with the latest vesion of conda-store, we haven't cut a release with that change. We'll probably cut a release this week (and then the above block should work), but in the meantime, can you try this block instead?

conda_store:
  image: quay.io/aktech/conda-store-server
  image_tag: sha-558beb8

This image corresponds to the previous conda-store release that we supported, which was 2024.3.1. It is in a different quay repo (from another Nebari contributor) because conda-store was not building ARM images at the time.

@shikanchen
Copy link
Author

@shikanchen sorry about that, I just realized that while we updated Nebari to be compatible with the latest vesion of conda-store, we haven't cut a release with that change. We'll probably cut a release this week (and then the above block should work), but in the meantime, can you try this block instead?

conda_store:
image: quay.io/aktech/conda-store-server
image_tag: sha-558beb8
This image corresponds to the previous conda-store release that we supported, which was 2024.3.1. It is in a different quay repo (from another Nebari contributor) because conda-store was not building ARM images at the time.

@marcelovilla Thanks for your assistance! The deployment is now working correctly.

@github-project-automation github-project-automation bot moved this from New 🚦 to Done 💪🏾 in 🪴 Nebari Project Management Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs: triage 🚦 Someone needs to have a look at this issue and triage type: bug 🐛 Something isn't working
Projects
Status: Done 💪🏾
Development

No branches or pull requests

3 participants