-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] - Deploying locally on Mac OS, Unable to pull conda-store-server Image #2918
Comments
I think @marcelovilla has encountered a similar problem recently; if it's the same, the issue was with how the most recent conda-store released docker images didn't have the proper tagging scheme for the ARM images. The solution back then was to pass the manually sha hash instead of the label. You will need to make a quick change in the deployment config manifest, here's a command for you to try: kubectl set image deployment/nebari-conda-store-worker conda-store-server=<new-image> --namespace=<namespace> this will soon be addressed by the conda-store team, but in the mean time the above should be a good workarorund |
@shikanchen can you try adding this block in your conda_store:
image: quay.io/quansight/conda-store-server
image_tag: sha-f8875ca As @viniciusdc mentioned, the conda-store images are not being properly tagged for ARM so you have to specify a hash for the specific build. You can find all the images at https://quay.io/repository/quansight/conda-store-server?tab=tags. |
@marcelovilla Thank you for the suggested fix. I applied the fix to update the image and tag for conda-store-sercer in the nebari-config.yaml file, and after redeploying, the deployment successfully passed the image-pulling step. But the conda-store health check is now failing consistently. The issue persists after I tried older released images at https://quay.io/repository/quansight/conda-store-server?tab=tags. The conda-store-server pod shows as running (READY 1/1), but it consistently fails the health check for https://172.18.1.100/conda-store/api/v1/. Here's the output of the issue: [tofu]: Apply complete! Resources: 1 added, 2 changed, 0 destroyed.
[tofu]:
[tofu]: Outputs:
[tofu]:
[tofu]: forward-auth-middleware = {
[tofu]: "name" = "traefik-forward-auth"
[tofu]: }
[tofu]: forward-auth-service = {
[tofu]: "name" = "forwardauth-service"
[tofu]: }
[tofu]: service_urls = {
[tofu]: "argo-workflows" = {
[tofu]: "health_url" = "https://172.18.1.100/argo/"
[tofu]: "url" = "https://172.18.1.100/argo/"
[tofu]: }
[tofu]: "conda_store" = {
[tofu]: "health_url" = "https://172.18.1.100/conda-store/api/v1/"
[tofu]: "url" = "https://172.18.1.100/conda-store/"
[tofu]: }
[tofu]: "dask_gateway" = {
[tofu]: "health_url" = "https://172.18.1.100/gateway/api/version"
[tofu]: "url" = "https://172.18.1.100/gateway/"
[tofu]: }
[tofu]: "jupyterhub" = {
[tofu]: "health_url" = "https://172.18.1.100/hub/api/"
[tofu]: "url" = "https://172.18.1.100/"
[tofu]: }
[tofu]: "keycloak" = {
[tofu]: "health_url" = "https://172.18.1.100/auth/realms/master"
[tofu]: "url" = "https://172.18.1.100/auth/"
[tofu]: }
[tofu]: "monitoring" = {
[tofu]: "health_url" = "https://172.18.1.100/monitoring/api/health"
[tofu]: "url" = "https://172.18.1.100/monitoring/"
[tofu]: }
[tofu]: }
Attempt 1 health check succeeded for url=https://172.18.1.100/argo/
Attempt 1 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 2 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 3 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 4 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 5 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 6 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 7 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 8 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 9 health check failed for url=https://172.18.1.100/conda-store/api/v1/
Attempt 10 health check failed for url=https://172.18.1.100/conda-store/api/v1/
ERROR: Service conda_store DOWN when checking url=https://172.18.1.100/conda-store/api/v1/ The pods status I pulled after the issue occurred: (nebari) ➜ ~ kubectl get pods -n dev
NAME READY STATUS RESTARTS AGE
alertmanager-nebari-kube-prometheus-sta-alertmanager-0 2/2 Running 0 27m
argo-workflows-server-585dd7f586-brc6h 1/1 Running 0 30m
argo-workflows-workflow-controller-586dcfd8f7-5tcc5 1/1 Running 0 30m
continuous-image-puller-vg8mx 1/1 Running 0 25m
forwardauth-deployment-7975cf64db-9f86t 1/1 Running 0 30m
hub-9d4c94bcd-k78zs 1/1 Running 0 25m
keycloak-0 1/1 Running 0 32m
keycloak-postgresql-0 1/1 Running 0 32m
loki-backend-0 2/2 Running 0 29m
loki-canary-5jf9v 1/1 Running 0 29m
loki-gateway-bf4d7b485-zfcxn 1/1 Running 0 29m
loki-read-6fb46c7db4-4lcnc 1/1 Running 0 29m
loki-write-0 1/1 Running 0 29m
nebari-conda-store-minio-7f68f7f4c8-pcvhm 1/1 Running 0 29m
nebari-conda-store-postgresql-postgresql-0 1/1 Running 0 29m
nebari-conda-store-redis-master-0 1/1 Running 0 29m
nebari-conda-store-server-649b9d499f-rqljn 1/1 Running 0 10m
nebari-conda-store-worker-547dc4899c-kjhqt 2/2 Running 0 10m
nebari-daskgateway-controller-9746b74bb-prp9c 1/1 Running 0 26m
nebari-daskgateway-gateway-85744f876f-mjckz 1/1 Running 0 26m
nebari-grafana-5f7f4cb8f4-82f55 3/3 Running 0 28m
nebari-jupyterhub-sftp-68d8999fd7-w7hjz 1/1 Running 0 29m
nebari-jupyterhub-ssh-675fbfdb95-2cszh 1/1 Running 0 29m
nebari-kube-prometheus-sta-operator-77cbbffb7d-rx2cx 1/1 Running 0 28m
nebari-kube-state-metrics-65b8c8fd48-2k688 1/1 Running 0 28m
nebari-loki-minio-7b7cbdd87b-9d7zx 1/1 Running 0 29m
nebari-prometheus-node-exporter-dpfxz 1/1 Running 0 28m
nebari-promtail-l8kh6 1/1 Running 0 27m
nebari-traefik-ingress-75f6d994dd-qzjz6 1/1 Running 0 33m
nebari-workflow-controller-5dd467bfc-p2qzd 1/1 Running 0 30m
nfs-server-nfs-6b8c9cd476-5dz7j 1/1 Running 0 30m
prometheus-nebari-kube-prometheus-sta-prometheus-0 2/2 Running 0 27m
proxy-7bfb8c4885-tqtwk 1/1 Running 0 25m
user-scheduler-6fc686fbf9-9sjhv 1/1 Running 0 25m
user-scheduler-6fc686fbf9-l95bw 1/1 Running 0 25m |
@shikanchen sorry about that, I just realized that while we updated Nebari to be compatible with the latest vesion of conda-store, we haven't cut a release with that change. We'll probably cut a release this week (and then the above block should work), but in the meantime, can you try this block instead? conda_store:
image: quay.io/aktech/conda-store-server
image_tag: sha-558beb8 This image corresponds to the previous conda-store release that we supported, which was |
@marcelovilla Thanks for your assistance! The deployment is now working correctly. |
Describe the bug
conda-store-server
andconda-store-worker
fail to roll out during a local deployment on macOS with Nebari.Still creating...
.Expected behavior
conda-store-server
andconda-store-worker
should complete successfully.conda-store-server
andconda-store-worker
pods should run and become Ready.linux/arm64
image without errors.OS and architecture in which you are running Nebari
OS: macOS Ventura 13.5.2, and Architecture: ARM64 (Apple Silicon)
How to Reproduce the problem?
conda install nebari -c conda-forge
. I attached the details of the environment I use for deploying Nebari inAnything else
section.nebari-config.yaml
) for a local deployment on macOS.Command output
Versions and dependencies used.
Compute environment
None
Integrations
conda-store
Anything else?
The text was updated successfully, but these errors were encountered: