-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Config property for target system software release #7518
base: main
Are you sure you want to change the base?
Conversation
@@ -885,6 +885,14 @@ authz_resource! { | |||
polar_snippet = FleetChild, | |||
} | |||
|
|||
authz_resource! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm defining this authz resource, but am not convinced that I'm using it right (or at all). Do these only apply to resources that use the lookup*
macros? Any guidance here would be appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking at the target blueprint as an analog here because it's similarly global.
It looks like we don't use authz_resource!
for this. Instead, we have a synthetic resource called BlueprintConfig
. It has its own type here and a singleton instance on which we manually implement PolarClass
and AuthorizedResource
. Then there's a snippet in omicron.polar about it.
That kind of seems like the way to go here. TargetRelease
isn't a type of "resource" in the same way that I think the authz_resource!
macro means it (i.e., something you could CRUD, that has an id, for which instances might be missing, etc.)
method = GET, | ||
path = "/v1/system/update/target-release", | ||
tags = ["system/update"], | ||
unpublished = true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind writing a comment as to why these two are unpublished? If there's a related GH issue, would you include it in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no good reason, really; I just copied the API parameters from the other /v1/system/update
endpoints (TUF repo depot). I can certainly mark these as published if we're happy with them; @iliana should we also publish the repository
endpoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason to publish the updates endpoints right now, they don't work unless you have configured Nexus in a special way. I don't know what our general policy is on endpoints that customers shouldn't/can't poke at yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a hidden
tag. I use it to skip those endpoints when generating the Go SDK (not sure about the rust or typescript one), but it would still be available in the API. Perhaps we should leave these unpublished and leave a comment explaining why.
Thoughts? @david-crespo @ahl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to merge these PRs but these endpoints definitely won't do anything for customers in the next release, I would lean toward unpublished. If we want our own clients (most likely the CLI) to be able to use them, we need them in the OpenAPI schema but can use hidden to keep them out of the docs. Sounds like the Go SDK leaves out hidden endpoints but I don't think that's true of all clients (e.g., the TS client generator does produce methods for hidden endpoints).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this being unpublished is also preventing the coverage test from finding it, which seems bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is looking good and it's going to be key structure for next steps.
My biggest questions here are the two in nexus/types/src/external_api/shared.rs.
'system_version' | ||
); | ||
|
||
-- The software release that should be deployed to the rack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this structured like bp_target
, where it's a list of all previous configurations, and only the one with the latest generation number matters? If so I think it'd be useful to document that here.
generation INT8 NOT NULL PRIMARY KEY, | ||
time_requested TIMESTAMPTZ NOT NULL, | ||
release_source omicron.public.target_release_source NOT NULL, | ||
system_version STRING(64), -- "foreign key" into the `tuf_repo` table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary key of tuf_repo
is a uuid. Should this be that? Or is it supposed to match the sha256
field of tuf_repo
?
release_source, | ||
system_version | ||
) VALUES ( | ||
0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nitty but I think I'd use 1
here just for consistency with other places we've used similar numbers (like OmicronZonesConfig generation and bp_target
version).
InstallDataset, | ||
|
||
/// Use the specified release of the rack's system software. | ||
SystemVersion(SemverVersion), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the external API, I'd suggest we use either a TUF repo id or its SHA256 rather than a semver. I get that the human wants to know something more like the semver (although even then, I think the descriptor for them is an opaque token -- it could as well be "2025 Q1"). But I think it will be simpler and clearer to just pick one of the TUF repo's identifiers here so that it's very obvious what's going on.
@@ -510,3 +511,27 @@ impl RelayState { | |||
.context("json from relay state string") | |||
} | |||
} | |||
|
|||
/// The source of the target release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the target release for the whole system, right? From an operator's perspective, I think the InstallDataset
variant would be better called "LastMupdate".
I wasn't thinking about providing this as an option. I can see we need a value to represent "it hasn't been set yet" and we might want to let people reset it to that. But what are the actual semantics of setting it to this value? Let's say the value was set to SystemVersion
and Reconfigurator has updated a few zones and now somebody comes and sets this to InstallDataset
/LastMupdate
. Does Reconfigurator actually undo the changes it made? I'd be inclined to say no, if you want that, you need to go mupdate again. In that case though this isn't really setting the target release to "last mupdate" or "a specific version", it's setting or unsetting a target release. Maybe this should just be an Option<SystemVersion>
?
method = GET, | ||
path = "/v1/system/update/target-release", | ||
tags = ["system/update"], | ||
unpublished = true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this being unpublished is also preventing the coverage test from finding it, which seems bad.
/// rack should eventually correspond to the release described here. | ||
#[endpoint { | ||
method = GET, | ||
path = "/v1/system/update/target-release", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on making this /v1/system/target-release
and calling the operation target_release_get
(or view)?
self.update_tuf_repo_get(opctx, system_version) | ||
.await | ||
.map_err(|e| err.bail(e))? | ||
.repo | ||
.system_version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to panic here, right?
If we were doing this in raw SQL I might do:
INSERT INTO target_release (id, ...) VALUES (SELECT id FROM tuf_repo WHERE system_version = ...), ...)
This would avoid an interactive transaction and reduce contention a lot but I guess it's fairly painful to do in Diesel. We could also fetch the id outside of the transaction but I suppose we still have to check that it's still there so it doesn't help much.
@@ -1046,6 +1046,7 @@ pub enum ResourceType { | |||
Oximeter, | |||
MetricProducer, | |||
RoleBuiltin, | |||
TargetRelease, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this get used? Is it that the authz_resource!
macro expected it?
I wouldn't really expect this to be needed because I think it's mainly used for generating 404s and you can't get a 404 on a target release.
@@ -885,6 +885,14 @@ authz_resource! { | |||
polar_snippet = FleetChild, | |||
} | |||
|
|||
authz_resource! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking at the target blueprint as an analog here because it's similarly global.
It looks like we don't use authz_resource!
for this. Instead, we have a synthetic resource called BlueprintConfig
. It has its own type here and a singleton instance on which we manually implement PolarClass
and AuthorizedResource
. Then there's a snippet in omicron.polar about it.
That kind of seems like the way to go here. TargetRelease
isn't a type of "resource" in the same way that I think the authz_resource!
macro means it (i.e., something you could CRUD, that has an id, for which instances might be missing, etc.)
|
||
-- The software release that should be deployed to the rack. | ||
CREATE TABLE IF NOT EXISTS omicron.public.target_release ( | ||
generation INT8 NOT NULL PRIMARY KEY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be inclined to call this version
. That's what we used in bp_target
. But we do use both version
and generation
in different places and I'm not sure really sure I could explain when to use which. Either seems okay.
Fixes #7280. Provides external API endpoints and corresponding datastore methods to get/set the current
target_release
config property. Releases are designated by their semantic version, and must be uploaded to the TUF repo depot prior to being set as the target release.This PR does not attempt to actually upgrade the rack to the current target release. The reconfigurator changes to plan & execute such an upgrade will be handled as a follow-up.