Vespa version file in var still points to an older version after a bunch of successful upgrades and suddenly causes a container recreation to fail with cannot upgrade more than 30 versions #33214

nehajatav · 2025-01-29T19:08:37Z

Describe the bug
We did a bunch of upgrades to get from 8.406.26 to 8.466.24 on Jan 23rd on a two node cluster with one node running config and container and another node running the content node. Things have been running with smoothly with all logs only suggesting that configserver was restarted successfully a couple of times with 8.466.24 after weekend reboots on 25th. Today we did a redeployment to recreate the podman container without nuking the data folders. However, the config server refused to come up with error
Cannot upgrade from 8.406.26 to 8.466.24. ...... is too large (› 30 releases). Setting VESPA_SKIP_UPGRADE_CHECK-true will skip this check at your own risk, see https://vespa.al/releases.html#versions\n\tat com.yahoo.vespa.config.server....

To Reproduce
Not sure if this can be reproduced

Expected behavior
8.466.24 should be detected for previous version

Environment (please complete the following information):

RHEL8v8
Infrastructure: Podman
Versions 4.4.1

Vespa version
While upgrading from 8.406.26 to 8.466.24

Additional context
Upon checking /opt/vespa/var/db/vespa/config_server/server_db/vespa_version today we noticed it has 8.406.26. How is it possible that this happened today but last two reboots at-least were fine. On 27th we see cluster controller restarted a few times with oom, apart from that no other service restarted since last host reboot on 26th

The text was updated successfully, but these errors were encountered:

hmusum · 2025-01-30T14:08:34Z

The config server writes /opt/vespa/var/db/vespa/config_server/server_db/vespa_version and a node in ZooKeeper if it starts successfully (you can do vespa-zkcat /config/v2/vespa_version to see which version it stores in ZooKeeper). When starting the config server it compares the version running with what it finds in ZooKeeper, or the version in the file vespa_version if there is no data in ZooKeeper. Then it compares versions and stops if the difference is more than 30 versions.

It's hard to say what has happened here, but some guesses:

Config server was upgraded but not started successfully
You somehow switched container and config server nodes, so there was old state on the one that was now used as config server.

If you are able to reproduce this we can look into it, otherwise it's really hard to say what happened. We have never seen this before in our own servers, where we have upgraded hundreds of servers hundreds of times each without seeing this.

hmusum self-assigned this Jan 30, 2025

hmusum added this to Support Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vespa version file in var still points to an older version after a bunch of successful upgrades and suddenly causes a container recreation to fail with cannot upgrade more than 30 versions #33214

Vespa version file in var still points to an older version after a bunch of successful upgrades and suddenly causes a container recreation to fail with cannot upgrade more than 30 versions #33214

nehajatav commented Jan 29, 2025

hmusum commented Jan 30, 2025

Vespa version file in var still points to an older version after a bunch of successful upgrades and suddenly causes a container recreation to fail with cannot upgrade more than 30 versions #33214

Vespa version file in var still points to an older version after a bunch of successful upgrades and suddenly causes a container recreation to fail with cannot upgrade more than 30 versions #33214

Comments

nehajatav commented Jan 29, 2025

hmusum commented Jan 30, 2025