Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config property for target system software release #7518

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

plotnick
Copy link
Contributor

Fixes #7280. Provides external API endpoints and corresponding datastore methods to get/set the current target_release config property. Releases are designated by their semantic version, and must be uploaded to the TUF repo depot prior to being set as the target release.

This PR does not attempt to actually upgrade the rack to the current target release. The reconfigurator changes to plan & execute such an upgrade will be handled as a follow-up.

@@ -885,6 +885,14 @@ authz_resource! {
polar_snippet = FleetChild,
}

authz_resource! {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm defining this authz resource, but am not convinced that I'm using it right (or at all). Do these only apply to resources that use the lookup* macros? Any guidance here would be appreciated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at the target blueprint as an analog here because it's similarly global.

It looks like we don't use authz_resource! for this. Instead, we have a synthetic resource called BlueprintConfig. It has its own type here and a singleton instance on which we manually implement PolarClass and AuthorizedResource. Then there's a snippet in omicron.polar about it.

That kind of seems like the way to go here. TargetRelease isn't a type of "resource" in the same way that I think the authz_resource! macro means it (i.e., something you could CRUD, that has an id, for which instances might be missing, etc.)

method = GET,
path = "/v1/system/update/target-release",
tags = ["system/update"],
unpublished = true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind writing a comment as to why these two are unpublished? If there's a related GH issue, would you include it in the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no good reason, really; I just copied the API parameters from the other /v1/system/update endpoints (TUF repo depot). I can certainly mark these as published if we're happy with them; @iliana should we also publish the repository endpoints?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason to publish the updates endpoints right now, they don't work unless you have configured Nexus in a special way. I don't know what our general policy is on endpoints that customers shouldn't/can't poke at yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a hidden tag. I use it to skip those endpoints when generating the Go SDK (not sure about the rust or typescript one), but it would still be available in the API. Perhaps we should leave these unpublished and leave a comment explaining why.

Thoughts? @david-crespo @ahl

Copy link
Contributor

@david-crespo david-crespo Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to merge these PRs but these endpoints definitely won't do anything for customers in the next release, I would lean toward unpublished. If we want our own clients (most likely the CLI) to be able to use them, we need them in the OpenAPI schema but can use hidden to keep them out of the docs. Sounds like the Go SDK leaves out hidden endpoints but I don't think that's true of all clients (e.g., the TS client generator does produce methods for hidden endpoints).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this being unpublished is also preventing the coverage test from finding it, which seems bad.

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is looking good and it's going to be key structure for next steps.

My biggest questions here are the two in nexus/types/src/external_api/shared.rs.

'system_version'
);

-- The software release that should be deployed to the rack.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this structured like bp_target, where it's a list of all previous configurations, and only the one with the latest generation number matters? If so I think it'd be useful to document that here.

generation INT8 NOT NULL PRIMARY KEY,
time_requested TIMESTAMPTZ NOT NULL,
release_source omicron.public.target_release_source NOT NULL,
system_version STRING(64), -- "foreign key" into the `tuf_repo` table
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary key of tuf_repo is a uuid. Should this be that? Or is it supposed to match the sha256 field of tuf_repo?

release_source,
system_version
) VALUES (
0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nitty but I think I'd use 1 here just for consistency with other places we've used similar numbers (like OmicronZonesConfig generation and bp_target version).

InstallDataset,

/// Use the specified release of the rack's system software.
SystemVersion(SemverVersion),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the external API, I'd suggest we use either a TUF repo id or its SHA256 rather than a semver. I get that the human wants to know something more like the semver (although even then, I think the descriptor for them is an opaque token -- it could as well be "2025 Q1"). But I think it will be simpler and clearer to just pick one of the TUF repo's identifiers here so that it's very obvious what's going on.

@@ -510,3 +511,27 @@ impl RelayState {
.context("json from relay state string")
}
}

/// The source of the target release.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the target release for the whole system, right? From an operator's perspective, I think the InstallDataset variant would be better called "LastMupdate".

I wasn't thinking about providing this as an option. I can see we need a value to represent "it hasn't been set yet" and we might want to let people reset it to that. But what are the actual semantics of setting it to this value? Let's say the value was set to SystemVersion and Reconfigurator has updated a few zones and now somebody comes and sets this to InstallDataset/LastMupdate. Does Reconfigurator actually undo the changes it made? I'd be inclined to say no, if you want that, you need to go mupdate again. In that case though this isn't really setting the target release to "last mupdate" or "a specific version", it's setting or unsetting a target release. Maybe this should just be an Option<SystemVersion> ?

method = GET,
path = "/v1/system/update/target-release",
tags = ["system/update"],
unpublished = true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this being unpublished is also preventing the coverage test from finding it, which seems bad.

/// rack should eventually correspond to the release described here.
#[endpoint {
method = GET,
path = "/v1/system/update/target-release",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on making this /v1/system/target-release and calling the operation target_release_get (or view)?

Comment on lines +70 to +74
self.update_tuf_repo_get(opctx, system_version)
.await
.map_err(|e| err.bail(e))?
.repo
.system_version,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to panic here, right?

If we were doing this in raw SQL I might do:

INSERT INTO target_release (id, ...) VALUES (SELECT id FROM tuf_repo WHERE system_version = ...), ...)

This would avoid an interactive transaction and reduce contention a lot but I guess it's fairly painful to do in Diesel. We could also fetch the id outside of the transaction but I suppose we still have to check that it's still there so it doesn't help much.

@@ -1046,6 +1046,7 @@ pub enum ResourceType {
Oximeter,
MetricProducer,
RoleBuiltin,
TargetRelease,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this get used? Is it that the authz_resource! macro expected it?

I wouldn't really expect this to be needed because I think it's mainly used for generating 404s and you can't get a 404 on a target release.

@@ -885,6 +885,14 @@ authz_resource! {
polar_snippet = FleetChild,
}

authz_resource! {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at the target blueprint as an analog here because it's similarly global.

It looks like we don't use authz_resource! for this. Instead, we have a synthetic resource called BlueprintConfig. It has its own type here and a singleton instance on which we manually implement PolarClass and AuthorizedResource. Then there's a snippet in omicron.polar about it.

That kind of seems like the way to go here. TargetRelease isn't a type of "resource" in the same way that I think the authz_resource! macro means it (i.e., something you could CRUD, that has an id, for which instances might be missing, etc.)


-- The software release that should be deployed to the rack.
CREATE TABLE IF NOT EXISTS omicron.public.target_release (
generation INT8 NOT NULL PRIMARY KEY,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be inclined to call this version. That's what we used in bp_target. But we do use both version and generation in different places and I'm not sure really sure I could explain when to use which. Either seems okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

config property for target system version
7 participants